counter_cache as an optimization step
I had a debate with a programmer friend of mine once about optimization. I argued that once you get past adding indexes to a relational database, every attempt at increasing its speed or scaling is an act of denormalization. Whether its double storing information in the database in such a way that it can be read more efficiently by different queries or its adding in a cacheing layer—be it key-value-esque or page/fragment caching your view layer—it’s all essentially doing the same thing: making your DBA’s head explode.
Anyways, Rails has a great mechanism for doing exactly this, and its called counter_cache. Its an attribute on a has_many/belongs_to association (it’s actually on the belongs_to) which keeps track of the number of belonging to objects in a count column on the having many table.
In the application I’m working on there are categories, tags, and images. Tags and images have a many to many relationship; you can tag an image multiple times and a tag can be re-used on multiple images. Tags also belong to categories. When trying to improve the load times of a page, I was immediately struck by the code counting the images for a catogory:
class Category < ActiveRecord::Base has_many :tags def image_count tags.inject(0) { |tag, memo| memo + tag.images.count } end end
To find the count for each category, we have to instantiate every single tag object associated with the category and run a separate SQL count for each to grab the images. And this could be worse depending on what else ActiveRecord decides to load with your tags (maybe you have a default_scope that includes images or an after_find callback). Worst case, we’re running 20 SQL queries if there are 20 tags … and if there are 20 categories, with 20 tags, all on the same page …
To fix the problem I had to start with the tags and images themselves. The way they were defined was:
class Tag < ActiveRecord::Base has_and_belongs_to_many :images end class Image < ActiveRecord::Base has_and_belongs_to_many :tags end
but I wanted to take advantage of the counter_cache. Without changing the join table at all, I simply added a model and changed the assocations:
class ImagesTag < ActiveRecord::Base belongs_to :image, :counter_cache => true belongs_to :tag, :counter_cache => true end class Tag < ActiveRecord::Base has_many :images_tags, :dependent => :destroy has_many :images, :through => :images_tags end class Image < ActiveRecord::Base has_many :images_tags, :dependent => :destroy has_many :tags, :through => :images_tags end
and with the following migration:
class AddCounterCachesToImagesAndTags < ActiveRecord::Migration def self.up add_column :images, :tag_count, :integer, :default => 0 add_column :tags, :image_count, :integer, :default => 0 # this is necessary because in production w/ cache_classes on the # definition of the Image and Tag objects won't reload to reflect # the new columns unless you do it explicitly Image.reset_column_information Tag.reset_column_information Image.all.each do |image| Image.update_counters image.id, :tag_count => image.tags.count end Tag.all.each do |tag| Tag.update_counters tag.id, :image_count => tag.images.count end add_column :images_tags, :id, :integer unless columns.detect { |column| column.name == 'id' } end def self.down remove_column :images, :tag_count remove_column :tags, :image_count remove_column :images_tags, :id if columns.detect { |column| column.name == 'id' } end end
I now had denormalized image_count and tag_count columns in my database available as attributes on objects (which I definitely needed in my view anyways).
To take advantage of this data for categories, I rewrote the image_count method:
class Category < ActiveRecord::Base has_many :tags def image_count tags.sum(:image_count) end end
Now no matter how many tags belong to a category and regardless of any extra loading a tag might do, there are no extra objects instantiated by the single SQL query to find the number of images belonging to a category.
Rails does all the work to keep these count columns in sync without any extra effort on my part. It tries (hard enough in most cases) to keep the overhead as low as possible using SQL without going through ActiveRecord, which is nice because you don’t have to put that SQL straight into your model.
There’s lots of built-in magic in Rails and unless you happen to be on top of new code being pushed into Edge (and there are great blogs to help!), you can miss it and never even know it exists. This isn’t new voodoo, its been around since at least Rails 2.0, perhaps earlier. If it slid past your radar, this is a bit of magic that’s easy to use and easy to add onto an already working application as an optimization step. Just don’t tell the DBA.
Hat-tip to the railscasts guys for the migration syntax. You should also check out their counter cache column screencast for more info.
UPDATE: I made all the changes suggested by Rain in the comments.





