Keeping with the theme of Big Data, as we spoke a couple of days back , the concept of N=all suddenly started to give rise to a whole slew of new challenges – that which is an obvious consequence of dealing with such large chunks of data. Storage and retrieval! The ability to quickly retrieve, analyze and correlate data to derive information becomes essential when it comes to dealing with big data. And for such massive amounts of data, relational databases do not seem to jive all that well. One of the major reasons for this is the fact that relational (although I may now safely call it, the traditional) databases require a structure to the data that it can store. Now when you are trying to correlate between the users’ location data Vs the local deals (as an example) and add on the users’ personal credit card usage, the data does not always fall into a structured pattern for it to be stored in a relational database. Along came NoSQL . The name was borrowed from the 1998 open source RDMS developed by Carlo Strozzi, and was later popularized by Eric Evans of Rackspace.
Unlike SQL or any of the other traditional databases, noSQL can be viewed more as a collective term for a variety of new data storage backends, with the concept of transactions taken out of it. With its eternally loose definitions, a noSQL can possibly aggregate data from rows that span across multiple tables in a traditional relational database. Now this obviously results in enormous chunks of data posing storage challenges. However with the costs associated with storage decreasing rapidly, this can be ignored when compared to the potential that you now have. Couchbase , one of those companies that have caught on quickly to this new revolution in data storage and retrieval with its document-oriented database technology, outlines an interesting article on why noSQL .
They are not the only ones that have grown into this new idea. Hadoop , is yet another one of those, that has quickly become a new household name. Developed and sustained by a group of unpaid volunteers, Hadoop is a framework to process large data sets, perhaps know as big data. Rumored to have been spun off as a free implementation of Google MapReduce , several big names have built services and solutions around this framework, some of the notable ones being Amazon Web Services (AWS), VMWare Hadoop Virtual Extensions (HVE), IBM BigInsights.
Yet another database that has been gaining popularity off late is MongoDB – a project spun off by 10Gen . Like Couchbase, this is also a document-oriented database and has started to pick up several implementations including SAP, MTV and Sourceforge.
With an “unstructured” database comes the challenges of querying it. Mongo uses a skewed version of JSON (known as BSON or Binary JSON) for representing queries whereas Couchbase has adopted a SQL-like query language that is slowly becoming a standard world wide, known as unQL (Unstructured Query Language).
While all these are still in the nascent stages of development, as the big data wave is rapidly approaching it peak, let me leave you with a slide deck from the QCon London 2013 presented by Matt Asay, VP of Corporate Strategy at 10gen on the “Past, Present and Future of noSQL.