Fetching by a cached index produces two queries
While debugging N+1 queries, I found an unusual behaviour in IDC: when fetching records using a cached attribute, IDC produces two queries instead of one. The first query uses the cached attribute, while the second uses the primary key.
After some investigation and some very helpful tips from coworkers, I managed to boil the issue down to this snippet:
# migration file
class CreateSampleRecords < ActiveRecord::Migration[5.2]
mark_as_podded_migration!
def up
create_table :sample_records do |t|
t.string :key
t.timestamps
end
end
def down
drop_table :sample_records
end
end
# ActiveRecord model
class SampleRecord < ApplicationRecord
include IdentityCache
cache_index :key
end
Then, on rails console:
[1] pry(main)> SampleRecord.create(key: '123')
(1.6ms) BEGIN
SampleRecord Create (1.0ms) INSERT INTO `sample_records` (`key`) VALUES ('123')
(2.3ms) COMMIT
=> #<SampleRecord:0x00007fa11061a158 id: 1, key: "123">
[2] pry(main)> SampleRecord.fetch_by_key('123')
(1.4ms) SELECT `sample_records`.`id` FROM `sample_records` WHERE `sample_records`.`key` = '123'
SampleRecord Load (0.9ms) SELECT `sample_records`.* FROM `sample_records` WHERE `sample_records`.`id` = 2
=> [#<SampleRecord:0x00007fa0f5169a70 id: 1, key: "123">]
I also tried with unique: true, it produces the same result.
I might be wrong, but I would expect only one query to be produced by this snippet. The first query already fetches the desired record, I see no reason to search for the object again using the primary key.
Thoughts?
The first query only gets the id for the record, which often can be obtained from a covering index, so wouldn't even read from the primary/clustered index. It then checks the cache for the record using its id and only gets the full record on a cache miss, so it doesn't necessary perform a second query.
In your example, you should be able to see that the second query isn't performed if you precede SampleRecord.fetch_by_key('123') with SampleRecord.fetch(sample_record_id).
One way of avoiding the extra query would be by embedding the whole record in the cache index, but there are trade-offs with that. That would increase how often the cache index gets invalidated, increase the number of keys that need to be invalidated when the record or any of its embedded associations gets updated and increase the amount of memcached memory used to denormalize this data.