
2. The NoSQL-Movement | 20
Ad Hoc Data Querying Similarly to data fixing a query and manipulation for the particular datastore
is required when it comes to ad hoc queries and querying distributed datastores is harder than
querying centralized ones. Schmidt states that for some reporting tasks the MapReduce approach
(cf. [DG04]) is the right one, but not for every ad hoc query. Furthermore, he sees the rather cultural
than technical problem that customers have become trained and “addicted” to ad hoc reporting and
therefore dislike the absence of these means. For exhaustive reporting requirements Schmidt suggests
to use a relational database that mirrors the data of live databases for which a NoSQL store might
be used due to performance and scalability requirements.
Data Export Schmidt states that there are huge differences among the NoSQL databases regarding this
aspect. Some provide a useful API to access all data and in some it is absent. He also points out
that it is more easy to export data from non-distributed NoSQL stores like CouchDB, MongoDB or
Tokyo Tyrant as from distributed ones like Projekt Voldemort or Cassandra.
Schmidt’s points are humorously and extensively iterated in the talk “Your Guide to NoSQL” (cf. [Ake09])
which especially parodies the NoSQL advocates’ argument of treating every querying need in a MapReduce
fashion.
2.2.7. Performance vs. Scalability
BJ Clark presents an examination of various NoSQL databases and MySQL regarding performance and
scalability in his blog post “NoSQL: If only it was that easy”. At first, he defines scalability as “to change
the size while maintaining proportions and in CS this usually means to increase throughput.” Blogger
Dennis Forbes agrees with this notion of scalabilty as “pragmatically the measure of a solution’s ability to
grow to the highest realistic level of usage in an achievable fashion, while maintaining acceptable service
levels” (cf. [For10]). BJ Clark continues: “What scaling isn’t: performance. [. . . ] In reality, scaling doesn’t
have anything to do with being fast. It has only to do with size. [. . . ] Now, scaling and performance do
relate in that typically, if something is performant, it may not actually need to scale.” (cf. [Cla09]).
Regarding relational databases Clark states that “The problem with RDBMS isn’t that they don’t scale,
it’s that they are incredibly hard to scale. Ssharding[sic!] is the most obvious way to scale things, and
sharding multiple tables which can be accessed by any column pretty quickly gets insane.” Blogger Dennis
Forbes agrees with him that “There are some real scalability concerns with old school relational database
systems” (cf. [For10]) but that it is still possible to make them scale using e. g. the techniques described
by Adam Wiggins (cf. [Wig09]).
Besides sharding of relational databases and the avoidance of typical mistakes (as expensive joins caused
by rigid normalization or poor indexing) Forbes sees vertical scaling as still an option that can be easy,
computationally effective and which can lead far with “armies of powerful cores, hundreds of GBs of
memory, operating against SAN arrays with ranks and ranks of SSDs”. On the downside, vertical scaling
can relatively costly as Forbes also admits. But Forbes also argues for horizontal scaling of relational
databases by partitioning data and adding each machine to a failover cluster in order to achieve redundancy
and availability. Having deployments in large companies in mind where constraints are few and money is
often not that critical he states from his own experience
9
that “This sort of scaling that is at the heart
of virtually every bank, trading system, energy platform, retailing system, and so on. [. . . ] To claim that
SQL systems don’t scale, in defiance of such obvious and overwhelming evidence, defies all reason” (cf.
[For10]). Forbes argues for the use of own servers as he sees some artificial limits in cloud computing
environments such as limitations of IO and relatively high expenditures for single instances in Amazon’s
EC2; he states that “These financial and artificial limits explain the strong interest in technologies that
allows you to spin up and cycle down as needed” (cf. [For10]).
9
Forbes has worked in the financial, assurance, telecommunication and power supply industry.