Saturday, October 2, 2010

Why NoSQL?

Recently, there has been a movement to NoSQL and a disdain for traditional RDBMS when it comes to web development. There have been many discussions pitting the two against each other, almost reaching a religious fervor. With the movement, there have been several new databases available:

Why NoSQL?

Hasn't RDBMS served us well for decades? They're mature, well thought out, and optimized for their task. SQL is a powerful querying language that allows you to do almost anything. So why the change?

PITA

It's human nature at its best. We're given a tool for a task. We use it to its fullest and try to increase the efficiency of the tool. However, there will always be frustrations and clumsiness inherent with the tool. No tool is ever perfect for the job at hand. Working with a tool for many years, the frustrations accumulate until, one day, we can't take it anymore. It's a PITA. We invent something new, even if it isn't as good yet. It at least solves the frustrations of the previous tool, which at this point, is all that matters. Why do I say this is human nature at its best? Because this is how we improve. This is how we take it to the next level. Human beings are never satisfied with the status quo for long.

Scaling

So what's so frustrating with RDBMS? Scaling. It's that simple. RDBMS has problems with scale. When an RDBMS gets large, think about what happens with the following:
  • migrations
  • expansion
  • backups
  • failover
  • locking

Migrations

What happens when you need to migrate data in a table that has millions of rows? Hours of downtime. Downtime for your site means frustrations for your customers. That's not a good thing.

Expansion

How easy is it to tack on another database server? Well, you're going to have to worry about sharding first in the application itself before you can even try. Don't have that code from the beginning? Well, start coding now.

Backups

Don't have sharding? Your backups are going to be huge. This isn't as much of a PITA these days with such large disk space, but no matter what, it's still easier to handle smaller pieces of data.

Failover

The current RDBMS implementations don't make it easy to setup failover. It's usually tacked on afterwards.

Locking

With ACID and transactions, locking is inevitable in RDBMS. With a large scale, there are bound to be long waits. Unfortunately, the larger the database, the more waiting. There are ways around this, but it's a PITA.

Variety

It's not to say that RDBMS isn't good. It's very good at what it does. But, for certain web applications, the tool just doesn't fit. The PITA factor goes through the roof. Does this mean we should default to NoSQL everytime? No. The point is variety. It's not about RDBMS vs NoSQL vs something else. It's about which tool fits best for the job. We should be glad that there's a choice. We can even decide to use RDBMS alongside NoSQL if it works better. Variety is the spice of life.

Where does it fit?

If we can't throw out RDBMS, then when should we use NoSQL? It seems like the jury is still out on this as not all NoSQL implementations are built the same. No implementation solves all the frustrations of RDBMS. Plus, NoSQL hasn't reached the age in production environments to tell what the side-effects are. You will need to do research on this front and see which implementation, if any, of NoSQL fits your problem. Here's a quick guideline though:
  • not many relations (low number of joins)
  • scaling required
  • read-intensive
  • ACID and transactions are not important
  • you just want to geek-out and live on the edge :D

No comments:

Post a Comment