mongoid-locker on collections sharded by "_id", Mongoid::Locker can't get a lock

trafficstars

(at least in my production environment) On any collection sharded by the shard key "_id", this code

2.0.0p247 :001 > i = Item.first
2.0.0p247 :002 > i.with_lock do
2.0.0p247 :003 >     puts i.inspect
2.0.0p247 :004?>   end

throws the exception "Mongoid::Locker::LockError: could not get lock" from line 148 in lib/mongoid/locker.rb 'lock'

a collection sharded by some other key doesn't seem to have this problem, nor does an unsharded collection.

Sep 11 '13 04:09 mepatterson

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?

Sep 11 '13 05:09 afeld

Yeah, man. I tried EVERYTHING. The only reason I figured it out is I had 3 collections, two sharded on "_id" and one sharded on some other field. The latter was the only one that didn't throw the locker exception. So I had my ops guy rebuild the other two collections with different shard keys and it started working perfectly, no code changes on my side.

Certainly open to the idea that you might discover something even more insidious going on, but that's what we determined.

I traced it down to your lock() method where you do the atomic check to see if something is locked or can acquire a lock (and then does it). On my "id" sharded collections, that would fail (return false) on a totally new, totally unlocked document with nils for all the locked* fields. At that point, I couldn't see an obvious problem, but you do use "_id" in your atomic query, so perhaps something going on there when a collection is sharded by _id?

On Sep 11, 2013, at 12:30 AM, Aidan Feldman [email protected] wrote:

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?

— Reply to this email directly or view it on GitHub.

Sep 11 '13 05:09 mepatterson

I set the retries to 20 or something and it just spun and spun and then threw the exception

On Sep 11, 2013, at 12:34 AM, "Matt E. Patterson" [email protected] wrote:

Yeah, man. I tried EVERYTHING. The only reason I figured it out is I had 3 collections, two sharded on "_id" and one sharded on some other field. The latter was the only one that didn't throw the locker exception. So I had my ops guy rebuild the other two collections with different shard keys and it started working perfectly, no code changes on my side.

Certainly open to the idea that you might discover something even more insidious going on, but that's what we determined.

I traced it down to your lock() method where you do the atomic check to see if something is locked or can acquire a lock (and then does it). On my "id" sharded collections, that would fail (return false) on a totally new, totally unlocked document with nils for all the locked* fields. At that point, I couldn't see an obvious problem, but you do use "_id" in your atomic query, so perhaps something going on there when a collection is sharded by _id?

On Sep 11, 2013, at 12:30 AM, Aidan Feldman [email protected] wrote:

consistently?? iiiiiinteresting. not sure how easy it will be to replicate that in a test environment :-/ will try to set up a couple of sharded mongo instances locally.

just for due diligence, would you mind upping the :retries value and the timeout value and see if that makes any difference?

— Reply to this email directly or view it on GitHub.

Sep 11 '13 05:09 mepatterson

Sharding/replication are the things about Mongo I know the least about, so I might pop over to the MongoDB office hours they hold in NYC to see if they have ideas.

Just a stab in the dark, but what indexes do you have on that collection that fails? Any compound indexes that include the _id?

Sep 11 '13 05:09 afeld

Nope. One of the two troubled collections has a bunch of compound indexes, but none with _id

On Sep 11, 2013, at 12:41 AM, Aidan Feldman [email protected] wrote:

Sharding/replication are the things about Mongo I know the least about, so I might pop over to the MongoDB office hours they hold in NYC to see if they have ideas.

Just a stab in the dark, but what indexes do you have on that collection that fails? Any compound indexes that include the _id?

— Reply to this email directly or view it on GitHub.

Sep 11 '13 05:09 mepatterson

mongoid-locker mongoid-locker copied to clipboard

on collections sharded by "_id", Mongoid::Locker can't get a lock

mongoid-locker
mongoid-locker copied to clipboard