Monday, October 25, 2010

Rails 3.0: MongoDB + Mongoid

Why Mongoid?

The de facto ORM for MongoDB in Rails is Mongo Mapper, so why choose Mongoid? You can ready why straight from the horses mouth here.

To summarize, Mongoid is built for Rails 3.0 and it handles larger documents better. It also feels like NoSQL when you use it. MongoMapper was built during Rails 2.x days and when MongoDB was young. It is modeled very closely to ActiveRecord to make the transition easier. Hence, it feels more like SQL. MongoMapper is more extensible though with a larger community.

MongoDB

Grab the latest build from MongoDB's download page. At this time, it is 1.6.3.

Download and install. We make a softlink. This way, if we upgrade, we just switch the softlink and everything else stays the same.
cd /usr/local/src/
sudo wget http://fastdl.mongodb.org/linux/mongodb-linux-i686-1.6.3.tgz
sudo tar xzf mongodb-linux-i686-1.6.3.tgz
sudo rm -rf mongodb-linux-i686-1.6.3.tgz
sudo ln -s mongodb-linux-i686-1.6.3 mongodb

Add /usr/local/src/mongodb/bin to your path.
PATH=$PATH:/usr/local/src/mongodb/bin

Make /data/db and give ownership to your user. There are obviously several ways to do this, but this is the easiest.
sudo mkdir -p /data/db
sudo chown -R wesley:wesley /data

Start your engines!
mongod

Mongoid

This builds on top of my previous post on installing Rails 3.0 with BDD. This assumes that the Rails application was created without ActiveRecord using -O. Check out Rails 3.0 agnosticism for an explanation.

Add mongoid to Gemfile. bson_ext is installed for a speed boost.
gem 'mongoid', '2.0.0.rc.6'
gem 'bson_ext', '~>1.2'

Install mongoid.
bundle install
rails generate mongoid:config

Cucumber

Add a cucumber environment to mongoid.yml.
cucumber:
  <<: *defaults
  database: myproject_cucumber 

Cucumber makes use of Database Cleaner. There are posts saying Database Cleaner doesn't work with Mongoid. However, it seems like it is now according to the official documentation. We need to modify features/support/env.rb this is not recommended as it is regenerated on a cucumber-rails upgrade. Instead, we will create features/support/local_env.rb.
require 'database_cleaner'
DatabaseCleaner.strategy = :truncation
DatabaseCleaner.orm = "mongoid"
Before { DatabaseCleaner.clean }

RSpec 2

RSpec includes ActiveRecord specific lines in spec/spec_helper.rb. You need to comment out the following 2 lines.
# config.fixture_path = "#{::Rails.root}/spec/fixtures"
# config.use_transactional_fixtures = true

To properly clean the database, RSpec needs to know how to do that with Mongoid. Again, we can use Database Cleaner for this.

Open up spec/spec_helper.rb and add the following in the RSpec.configure block
RSpec.configure do |config|    
  # Other things

  # Clean up the database      
  require 'database_cleaner'   
  config.before(:suite) do     
    DatabaseCleaner.strategy = :truncation
    DatabaseCleaner.orm = "mongoid" 
  end

  config.before(:each) do
    DatabaseCleaner.clean      
  end
end 

References


Updates

December 2, 2010: Changed Cucumber local_env.rb for database cleaning.
December 2, 2010: Added a section for RSpec database cleaning.
February 1, 2011: Updated Mongoid version in the Gemfile
February 26, 2011: Added a note to clean out ActiveRecord specific lines in RSpec.

Tuesday, October 19, 2010

Rails 3.0: Installing Cucumber + RSpec 2 + Capybara + AutoTest

Behaviour Driven Development (BDD) was created in response to Test Driven Development (TDD). TDD had brought the idea of testing to the forefront, but stopped short when it was applied mainly to developers. It failed to include other stakeholders. BDD is suppose to remedy this by specifying the behaviour of the application at a high level in English. This allows non-developers to spec out the application and be included in the conversation.

Cucumber

Cucumber is a BDD framework for Ruby. Specs will be written at this higher layer first to drive behaviour.

RSpec 2

RSpec 2 just came out of beta. It is a TDD framework for Ruby. Specs will be written at this second layer for testing the details of the implementation.

Capybara

Capybara is a replacement for Webrat. It is used to simulate how a real world user would interact with your application. A good post about why you would use Capybara over Webrat can be found here.

AutoTest

AutoTest runs your Cucumber and RSpec specs automatically whenever a file that affects the specs is modified.

Install

This builds on top of my previous post on installing Rails 3.0. This assumes that the Rails application was created without Test::Unit using -T. Check out Rails 3.0 agnosticism for an explanation.

Open Gemfile and add the following.
group :development, :test do
  gem 'capybara'
  gem 'database_cleaner'
  gem 'cucumber-rails'
  gem 'cucumber'
  gem 'rspec-rails'
  gem 'autotest'
  gem 'spork'
  gem 'launchy'
end

Install the gems. If you aren't using ActiveRecord, you won't have a database.yml. cucumber:install will complain unless you pass -D to it.
bundle install
rails generate rspec:install
rails generate cucumber:install --rspec --cabybara

AutoTest checks your entire project for changes. When tests fail, test.log is written to and because there is a change, AutoTest will kick-off again, and again, and again. To stop this from happening, create a .autotest at the root of your project to ignore certain files.
Autotest.add_hook :initialize do |at|                                                                                                                                                                          
  %w{ .git doc log tmp vendor }.each { |ex| at.add_exception( ex ) } 
end

AutoTest does not run Cucumber out of the box. There is debate whether autotesting Cucumber is a good idea since it is a very high-level test and can be quite heavy. Autotests should run quickly. If you want autotesting of Cucumber, you must add AUTOFEATURE to your environment before running or in your .bashrc.
export AUTOFEATURE=true

Start continuous testing. Hit ctrl+c twice to stop.
autotest

References


Updates

Feb. 8, 2011: Added solution to AutoTest continuously running on failure.

Monday, October 18, 2010

Rails 3.0: Agnosticism

When Rails and Merb merged, agnosticism was one of the big features to be added to Rails. They have stayed true to this vision.

Take a look at the help when creating a new Rails application.
rails new --help

You will notice three options.
  • -O: Skip Active Record
  • -T: Skip Test::Unit
  • -J: Skip Prototype

This allows you to create a fresh Rails application and have it ready to integrate with other gems more easily.
rails new myproject -OTJ

Monday, October 11, 2010

Rails 3.0: Installing with RVM + Thin

To keep things clean, we will use RVM's gemsets to install Rails 3.0. This will keep your work environment clean and allow you to switch between projects without worry which Ruby version or gems are installed.

Setup

I rarely look at the local documentations of gems, so let's not download them.
Create ~/.gemrc and add the following to it.
gem: --no-ri --no-rdoc

Install Ruby 1.9.2 and create a gemset for your project.
rvm install 1.9.2
rvm gemset create myproject 

Rails 3.0

When using a gemset, all gems installed will be bundled under that gemset. Therefore, you can have a gemset per project without worry about cross-contamination!

Use your gemset to install Rails 3.0. We will also need to install Bundler.
rvm use 1.9.2@myproject
gem install bundler
gem install rails
rails new myproject
cd myproject

Create .rvmrc in the root of your project and add the following to it.
rvm use 1.9.2@myproject
This tells RVM to switch to the project's Ruby version and gemset whenever you enter the project.

Give it permission.
cd ..
cd myproject
This will prompt you to trust the .rvmrc. Type 'y' and hit enter. It will automatically switch to 1.9.2@myproject now when going to the project.

Install the required Rails 3.0 gems. If you don't want sqlite, go into Gemfile and remove sqlite3-ruby and use your favourite database. I will be using MongoDB in a later post.
bundle install

Thin

WEBrick is slow. Let's install Thin.

Open up Gemfile and add thin.
gem 'thin'

Install thin.
bundle install

Start

Drum roll please!
rails server thin

Now visit http://localhost:3000/

Saturday, October 9, 2010

Why MongoDB?

MongoDB is a NoSQL implementation that I've decided to use for my project. One of the major deciding factors is that I deal with MongoDB at work and have experience with it. Unfortunately, this reason alone will not help you decide whether to use MongoDB, so I've outlined some other points below. Feel free to add more in the comments!

What is MongoDB?

MongoDB is a document-based database system. It stores everything in BSON, which is the binary format of JSON. A database holds a bunch of collections (tables). Each collection holds a bunch of documents (records/rows). Each document can be thought of as a large hash object. There are keys (columns) with values and the values can be anything represented in JSON, such as hashes, arrays, numbers, serialized objects, etc. MongoDB has been implemented with ease and speed as its main goals. Every design decision is made with this in mind, which leads to priorities in certain areas over others.

MongoDB vs RDBMS

This is similar to my previous post about NoSQL, but more specifically applied to MongoDB.

Advantages

Schema-less

Documents in a collection don't have to have the same format. This allows more flexible migrations, such as "lazy-loaded" migrations. Basically, there are certain migrations that don't have to happen en masse. They can occur individually when the document is read or written to. This allows for less downtime.

Scalable

Sharding is one of the goals MongoDB is concentrating on. Data is dispersed over two or more servers relieving the load on any single server which increases speed. Downtime is decreased because if a shard goes down, data on the other shards are still accessible. Being able to add shards lends to easier horizontal scaling. MongoDB sharding has been in active development for a while and was unleashed as of version 1.6. Rough spots still exist but MongoDB is looking to patch those up in the coming future.

Failover

Database servers are often setup in a master-slave format. This is not always easy to do. It's even better when the master fails and the slave is automatically upgraded to be the master. This is even harder to do. MongoDB does this seamlessly with replica sets. The servers in a set elect one server to be the master while the others replicate. If the master goes down, the others detect this and elect another server to be the master. New servers can be added without disturbing the setup. The application never has to know if the master has changed. No downtime. Elegant!

Speed

Speed is always a religious-like debate with a million benchmarks showing a million winners. MongoDB has documented a slew of benchmarks. What I take from this is that MongoDB is fast enough. It may or may not be the fastest, but it's definitely blazing. Coupled with the other advantages, I'd say this is a bonus.

GridFS

Ever needed to store large files? You've probably used the file system, Amazon S3, blobs etc. They may or may not have been easy to integrate, but it was another thing you had to deal with. Not with MongoDB. It implements a file storage specification called GridFS. It allows you to store large objects into the database as if it was a normal document. Not only is it one less thing to learn, it makes it easier to move your data since everything is in the database.

Disadvantages

Single server durability

MongoDB does not support this. Yet. Durability is the concept that anything committed to the database is actually committed to the database and resides in there permanently. Single server durability is the idea that a single server alone will maintain durability. However, MongoDB has a different stance on this. They believe that single server durability is not the goal but durability itself is and that durability should be attained through the use of multiple servers and replication. This is a goal MongoDB is actively working towards and which I believe they will achieve.

ACID

MongoDB does not support ACID. This prevents MongoDB being used in certain situations, but the trade-off is worth it for applications that don't require ACID. That gain is in speed. And it's noticeable.

Transactions

Transactions can be viewed as part of ACID, but I thought I'd make this explicit. If you require transactions, MongoDB is out of the question unless you roll your own. Again, for many applications out there, transactions are unnecessary.

Relational

MongoDB does not handle highly relational databases as well as RDBMS. I think this one is obvious, but many forget to take this into consideration when hopping feet first into MongoDB. You've been warned.

MongoDB vs Other NoSQL Implementations

Unfortunately, I haven't worked too much with other implementations of NoSQL. However, most popular NoSQL implementations are in use in production by notable companies. MongoDB is a little easier to wrap your mind around since it still retains quite a lot of similarities with RDBMS, but still gives you that extra kick. One other thing to consider is that MongoDB provides commercial support. This is not currently available with all NoSQL implementations. If you have experience with this, I'd like to hear about it in the comments below.

Conclusion

I'm choosing MongoDB because I've worked with it and have really enjoyed the experience over RDBMS. However, if you're at a crossroads, I'd recommend MongoDB in your next project unless you have a highly relational database or you need ACID.

Saturday, October 2, 2010

Why NoSQL?

Recently, there has been a movement to NoSQL and a disdain for traditional RDBMS when it comes to web development. There have been many discussions pitting the two against each other, almost reaching a religious fervor. With the movement, there have been several new databases available:

Why NoSQL?

Hasn't RDBMS served us well for decades? They're mature, well thought out, and optimized for their task. SQL is a powerful querying language that allows you to do almost anything. So why the change?

PITA

It's human nature at its best. We're given a tool for a task. We use it to its fullest and try to increase the efficiency of the tool. However, there will always be frustrations and clumsiness inherent with the tool. No tool is ever perfect for the job at hand. Working with a tool for many years, the frustrations accumulate until, one day, we can't take it anymore. It's a PITA. We invent something new, even if it isn't as good yet. It at least solves the frustrations of the previous tool, which at this point, is all that matters. Why do I say this is human nature at its best? Because this is how we improve. This is how we take it to the next level. Human beings are never satisfied with the status quo for long.

Scaling

So what's so frustrating with RDBMS? Scaling. It's that simple. RDBMS has problems with scale. When an RDBMS gets large, think about what happens with the following:
  • migrations
  • expansion
  • backups
  • failover
  • locking

Migrations

What happens when you need to migrate data in a table that has millions of rows? Hours of downtime. Downtime for your site means frustrations for your customers. That's not a good thing.

Expansion

How easy is it to tack on another database server? Well, you're going to have to worry about sharding first in the application itself before you can even try. Don't have that code from the beginning? Well, start coding now.

Backups

Don't have sharding? Your backups are going to be huge. This isn't as much of a PITA these days with such large disk space, but no matter what, it's still easier to handle smaller pieces of data.

Failover

The current RDBMS implementations don't make it easy to setup failover. It's usually tacked on afterwards.

Locking

With ACID and transactions, locking is inevitable in RDBMS. With a large scale, there are bound to be long waits. Unfortunately, the larger the database, the more waiting. There are ways around this, but it's a PITA.

Variety

It's not to say that RDBMS isn't good. It's very good at what it does. But, for certain web applications, the tool just doesn't fit. The PITA factor goes through the roof. Does this mean we should default to NoSQL everytime? No. The point is variety. It's not about RDBMS vs NoSQL vs something else. It's about which tool fits best for the job. We should be glad that there's a choice. We can even decide to use RDBMS alongside NoSQL if it works better. Variety is the spice of life.

Where does it fit?

If we can't throw out RDBMS, then when should we use NoSQL? It seems like the jury is still out on this as not all NoSQL implementations are built the same. No implementation solves all the frustrations of RDBMS. Plus, NoSQL hasn't reached the age in production environments to tell what the side-effects are. You will need to do research on this front and see which implementation, if any, of NoSQL fits your problem. Here's a quick guideline though:
  • not many relations (low number of joins)
  • scaling required
  • read-intensive
  • ACID and transactions are not important
  • you just want to geek-out and live on the edge :D