Saturday, October 9, 2010

Why MongoDB?

MongoDB is a NoSQL implementation that I've decided to use for my project. One of the major deciding factors is that I deal with MongoDB at work and have experience with it. Unfortunately, this reason alone will not help you decide whether to use MongoDB, so I've outlined some other points below. Feel free to add more in the comments!

What is MongoDB?

MongoDB is a document-based database system. It stores everything in BSON, which is the binary format of JSON. A database holds a bunch of collections (tables). Each collection holds a bunch of documents (records/rows). Each document can be thought of as a large hash object. There are keys (columns) with values and the values can be anything represented in JSON, such as hashes, arrays, numbers, serialized objects, etc. MongoDB has been implemented with ease and speed as its main goals. Every design decision is made with this in mind, which leads to priorities in certain areas over others.

MongoDB vs RDBMS

This is similar to my previous post about NoSQL, but more specifically applied to MongoDB.

Advantages

Schema-less

Documents in a collection don't have to have the same format. This allows more flexible migrations, such as "lazy-loaded" migrations. Basically, there are certain migrations that don't have to happen en masse. They can occur individually when the document is read or written to. This allows for less downtime.

Scalable

Sharding is one of the goals MongoDB is concentrating on. Data is dispersed over two or more servers relieving the load on any single server which increases speed. Downtime is decreased because if a shard goes down, data on the other shards are still accessible. Being able to add shards lends to easier horizontal scaling. MongoDB sharding has been in active development for a while and was unleashed as of version 1.6. Rough spots still exist but MongoDB is looking to patch those up in the coming future.

Failover

Database servers are often setup in a master-slave format. This is not always easy to do. It's even better when the master fails and the slave is automatically upgraded to be the master. This is even harder to do. MongoDB does this seamlessly with replica sets. The servers in a set elect one server to be the master while the others replicate. If the master goes down, the others detect this and elect another server to be the master. New servers can be added without disturbing the setup. The application never has to know if the master has changed. No downtime. Elegant!

Speed

Speed is always a religious-like debate with a million benchmarks showing a million winners. MongoDB has documented a slew of benchmarks. What I take from this is that MongoDB is fast enough. It may or may not be the fastest, but it's definitely blazing. Coupled with the other advantages, I'd say this is a bonus.

GridFS

Ever needed to store large files? You've probably used the file system, Amazon S3, blobs etc. They may or may not have been easy to integrate, but it was another thing you had to deal with. Not with MongoDB. It implements a file storage specification called GridFS. It allows you to store large objects into the database as if it was a normal document. Not only is it one less thing to learn, it makes it easier to move your data since everything is in the database.

Disadvantages

Single server durability

MongoDB does not support this. Yet. Durability is the concept that anything committed to the database is actually committed to the database and resides in there permanently. Single server durability is the idea that a single server alone will maintain durability. However, MongoDB has a different stance on this. They believe that single server durability is not the goal but durability itself is and that durability should be attained through the use of multiple servers and replication. This is a goal MongoDB is actively working towards and which I believe they will achieve.

ACID

MongoDB does not support ACID. This prevents MongoDB being used in certain situations, but the trade-off is worth it for applications that don't require ACID. That gain is in speed. And it's noticeable.

Transactions

Transactions can be viewed as part of ACID, but I thought I'd make this explicit. If you require transactions, MongoDB is out of the question unless you roll your own. Again, for many applications out there, transactions are unnecessary.

Relational

MongoDB does not handle highly relational databases as well as RDBMS. I think this one is obvious, but many forget to take this into consideration when hopping feet first into MongoDB. You've been warned.

MongoDB vs Other NoSQL Implementations

Unfortunately, I haven't worked too much with other implementations of NoSQL. However, most popular NoSQL implementations are in use in production by notable companies. MongoDB is a little easier to wrap your mind around since it still retains quite a lot of similarities with RDBMS, but still gives you that extra kick. One other thing to consider is that MongoDB provides commercial support. This is not currently available with all NoSQL implementations. If you have experience with this, I'd like to hear about it in the comments below.

Conclusion

I'm choosing MongoDB because I've worked with it and have really enjoyed the experience over RDBMS. However, if you're at a crossroads, I'd recommend MongoDB in your next project unless you have a highly relational database or you need ACID.

No comments:

Post a Comment