Tuesday, April 13, 2010

Searching in databases - Maybe what will drive the next wave

In the old days, before SQL and Relational and all that, not when Vikings toured the world, drinking, being violent and causing mayhem, but still in the old days, the databases in use, the first reasonably generic database systems, were Hierarchical or Network based. These had a strict schema and data was extracted by navigation (i.e. Find company X, find orders for company X, find items etc.), and there was no way of searching data (which wasn't much of a problem, as data was largely stored on tape anyway, which isn't really searchable in the now common sense).

When SQL came around, the relations style schema allowed a much more free way of navigating data, and it also allowed searching. The SQL search as we know it is still contextual (i.e. you have to specify what to search, a SELECT from a customer table based on address, will not retrieve employees with a matching address). All the same, when SQL came around, the ability to search and the relatively free structure of data relationships would take database use to a new level.

But searching today is often compared to Google, and this kind of search is really non-contextual. This is an area where the NoSQL movement has an edge on SQL, mostly because of the largely schema-free nature of NoSQL implementations. If search was a main driving force towards SQL, will the same happen with NoSQL? Maybe, I'm not sure. What I AM sure of though, is that we need to develop SQL and the relational model to support more schema-free operations, mainly search, but I think there are other areas where this is relevant. And will this be the final nail in the coffin of true SQL systems? I'm sure it's not, we can enhance the functionality in the SQL-based RDBMS without wreaking havoc with relational algebra, somehow. But any SQL-based RDBMS that will stay around needs to have some support for data that is non-structured.

So why will a SQL based RDBMS with support for unstructured data and searching be better than a plain NoSQL implementation? In my mind, this will be the case as NOT ALL DATA is unstructured. Customer information, credit card payment data, product catalogs and stuff is distinctly structured, and a SQL based RDBMS enhanced to support non-structured data will potentially allow you to work with any kind of data, structured or non-structured.

So, having one piece of software handle different types of data, is that really a good idea? In my mind, it is, as the deal here is that even if this data is a mix of structured and unstructured, the different sets of data is still related, and it is relevant to combine operations of both of them, as one set of data.


No comments:

Post a Comment