Posted
1 October 2009 @ 11am

Tagged
, , , , ,

Share and Enjoy
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Digg
  • Twitter
  • Reddit
  • MySpace
  • Technorati
  • StumbleUpon
  • Tumblr
  • Slashdot
  • email
  • Print

counter_cache as an optimization step

I had a debate with a programmer friend of mine once about optimization. I argued that once you get past adding indexes to a relational database, every attempt at increasing its speed or scaling is an act of denormalization. Whether its double storing information in the database in such a way that it can be read more efficiently by different queries or its adding in a cacheing layer—be it key-value-esque or page/fragment caching your view layer—it’s all essentially doing the same thing: making your DBA’s head explode.

Anyways, Rails has a great mechanism for doing exactly this, and its called counter_cache.  Its an attribute on a has_many/belongs_to association (it’s actually on the belongs_to) which keeps track of the number of belonging to objects in a count column on the having many table.

In the application I’m working on there are categories, tags, and images.  Tags and images have a many to many relationship; you can tag an image multiple times and a tag can be re-used on multiple images.  Tags also belong to categories.  When trying to improve the load times of a page, I was immediately struck by the code counting the images for a catogory:

class Category < ActiveRecord::Base
  has_many :tags

  def image_count
    tags.inject(0) { |tag, memo| memo + tag.images.count }
  end
end

To find the count for each category, we have to instantiate every single tag object associated with the category and run a separate SQL count for each to grab the images. And this could be worse depending on what else ActiveRecord decides to load with your tags (maybe you have a default_scope that includes images or an after_find callback). Worst case, we’re running 20 SQL queries if there are 20 tags … and if there are 20 categories, with 20 tags, all on the same page …

To fix the problem I had to start with the tags and images themselves. The way they were defined was:

class Tag < ActiveRecord::Base
  has_and_belongs_to_many :images
end

class Image < ActiveRecord::Base
  has_and_belongs_to_many :tags
end

but I wanted to take advantage of the counter_cache. Without changing the join table at all, I simply added a model and changed the assocations:

class ImagesTag < ActiveRecord::Base
  belongs_to :image, :counter_cache => true
  belongs_to :tag, :counter_cache => true
end

class Tag < ActiveRecord::Base
  has_many :images_tags, :dependent => :destroy
  has_many :images, :through => :images_tags
end

class Image < ActiveRecord::Base
  has_many :images_tags, :dependent => :destroy
  has_many :tags, :through => :images_tags
end

and with the following migration:

class AddCounterCachesToImagesAndTags < ActiveRecord::Migration
  def self.up
    add_column :images, :tag_count, :integer, :default => 0
    add_column :tags, :image_count, :integer, :default => 0

    # this is necessary because in production w/ cache_classes on the
    # definition of the Image and Tag objects won't reload to reflect
    # the new columns unless you do it explicitly
    Image.reset_column_information
    Tag.reset_column_information

    Image.all.each do |image|
      Image.update_counters image.id, :tag_count => image.tags.count
    end

    Tag.all.each do |tag|
      Tag.update_counters tag.id, :image_count => tag.images.count
    end

    add_column :images_tags, :id, :integer unless columns.detect { |column| column.name == 'id' }
  end

  def self.down
    remove_column :images, :tag_count
    remove_column :tags, :image_count
    remove_column :images_tags, :id if columns.detect { |column| column.name == 'id' }
  end
end

I now had denormalized image_count and tag_count columns in my database available as attributes on objects (which I definitely needed in my view anyways).

To take advantage of this data for categories, I rewrote the image_count method:

class Category < ActiveRecord::Base
  has_many :tags

  def image_count
    tags.sum(:image_count)
  end
end

Now no matter how many tags belong to a category and regardless of any extra loading a tag might do, there are no extra objects instantiated by the single SQL query to find the number of images belonging to a category.

Rails does all the work to keep these count columns in sync without any extra effort on my part.  It tries (hard enough in most cases) to keep the overhead as low as possible using SQL without going through ActiveRecord, which is nice because you don’t have to put that SQL straight into your model.

There’s lots of built-in magic in Rails and unless you happen to be on top of new code being pushed into Edge (and there are great blogs to help!), you can miss it and never even know it exists.  This isn’t new voodoo, its been around since at least Rails 2.0, perhaps earlier.  If it slid past your radar, this is a bit of magic that’s easy to use and easy to add onto an already working application as an optimization step.  Just don’t tell the DBA.

Hat-tip to the railscasts guys for the migration syntax.  You should also check out their counter cache column screencast for more info.

UPDATE: I made all the changes suggested by Rain in the comments.


  • Thanks Rain. Updated the code in the post, accordingly!
  • Rain
    You forgot the scenario about destroying images, it will not update the counter unless you set
    has_many :images_tags, :dependent => :destroy in Image model

    but it will require ImagesTag model has a primary key, so you have to add a primary key for it in the migration script.
  • Jim
    It's even older than that: it's on page 361 of the 2d edition Agile Web Development from PragProg, which makes it pre-Rails 1.2.3 IIRC.

    That book is totally worth a skim cover-to-cover so you can have many of those "don't I remember that one thing ..." moments. It won't replace keeping up with Edge Rails (or just reading the code you're using ;), but it's a good backup.
blog comments powered by Disqus
find_by_param for permalinks prototype-based image preloader