acts_as_list icon indicating copy to clipboard operation
acts_as_list copied to clipboard

PG::TRDeadlockDetected

Open seanarnold opened this issue 7 years ago • 23 comments

We seem to intermittently get Deadlock errors with acts_as_list and Postgres.

PG::TRDeadlockDetected: ERROR: deadlock detected DETAIL: Process 18092 waits for ShareLock on transaction 53198930; blocked by process 30411. Process 30411 waits for ShareLock on transaction 53198931; blocked by process 18092. HINT: See server log for query details. CONTEXT: while updating tuple (15,28) in relation "payment_methods" : UPDATE "payment_methods" SET "updated_at" = '2016-10-20 00:52:19.754301', "sort_order" = ("payment_methods"."sort_order" - 1) WHERE (user_id = 'user_id' AND deleted_at IS NULL) AND ("payment_methods"."sort_order" > 1) AND ("payment_methods"."sort_order" <= 3 AND "payment_methods".id != 'pm_id')

Is this a known issue? I'm using acts_as_list (0.7.7).

seanarnold avatar Oct 20 '16 01:10 seanarnold

Not that we're aware of. Can you try the latest version and see if that helps? Is it to do with with conflicting ORDER BY on this query vs another also updating records? I had a similar deadlock problem and it turned out to be due to ordering being overridden resulting in updates occurring from opposite ends of the table and clashing (that's a very technical explanation!).

We do lock a record using with_lock. I can't remember which version that came out on, so you could try the version before that to see if that is the problem. We're looking at removing or making this lock better.

brendon avatar Oct 20 '16 02:10 brendon

I can confirm this. Seems to happen as a combination of validate uniqueness and multiple requests hitting the servers at the same time. Currently we delay our requests (chain them to be serial requests) to prevent the deadlock issue.

ddengler avatar Nov 22 '16 10:11 ddengler

Thanks @ddengler, are you using the latest version?

We really need better locking I think. If you have any experience with that side of things we'd appreciate the help. I think there's an open issue about this.

@seanarnold, have you tried the latest version?

brendon avatar Nov 22 '16 20:11 brendon

@brendon yes, we are using the latest version. I will ask someone on my team to have a look at it again. Might take a while, as we currently have the issue resolved and some other priorities to work on.

ddengler avatar Nov 22 '16 20:11 ddengler

@brendon can you point me to the other open issue? I could not find it right away – only an old PR #135

ddengler avatar Nov 22 '16 20:11 ddengler

@ddengler, sorry I've had a look myself and can't find anything but that either. The discussion was around needing to lock all the records within a scope rather than just the record being updated (as we do now). This means that the shuffling that occurs when an item is moved will be atomic (is that the right term?). There is also an issue where the scope is empty but two threads attempt to create the first record in that scope simultaneously. We need a way to lock that empty scope also, and I'm not sure if that's possible with SQL at all?

I hope that helps :D No worries about the priorities. Gotta work on the important stuff first :D

brendon avatar Nov 22 '16 20:11 brendon

Some of this can be solved by using a lock on the parent record. Like in #254

swanandp avatar Feb 16 '17 03:02 swanandp

@seanarnold I recently removed the only lock that we do in acts_as_list. I'd be interested to see if this fixes your deadlock issue.

brendon avatar Aug 10 '17 07:08 brendon

@seanarnold, we made a change the other day (released as the latest version of the gem) that removed another point of contention. Can you try the latest gem to see if your deadlocks subside?

brendon avatar Oct 03 '17 19:10 brendon

@seanarnold, is this still a problem for you?

brendon avatar Mar 19 '18 07:03 brendon

@brendon sorry for not getting back to you on this. Sadly I cannot remember what we did back then to "fix" the issue for us. I just checked our error reporting again to be sure. Issue did not pop up for us anymore. We are currently using the latest version.

ddengler avatar Mar 19 '18 09:03 ddengler

We still see the issue in 0.9.9. Have changes to the locking behaviour been done since then?

seanarnold avatar Mar 19 '18 19:03 seanarnold

Not since then. We still only lock in the insert_at_position() function. If you have any ideas for how to solve this I'd be more than grateful :D

brendon avatar Mar 19 '18 21:03 brendon

It's probably more the implicit locks that come from updating large portions of the table at once (when shuffling positions etc...).

brendon avatar Mar 19 '18 21:03 brendon

I get deadlock PG::TRDeadlockDetected is same @seanarnold.
I'm using newest version 0.9.11 and update position with 212 rows.

Here is backtrace

- acts_as_list (0.9.11) lib/acts_as_list/active_record/acts/position_column_method_definer.rb:34:in `block (2 levels) in define_class_methods'
 - acts_as_list (0.9.11) lib/acts_as_list/active_record/acts/position_column_method_definer.rb:26:in `block (2 levels) in define_class_methods'
 - activerecord (4.2.7.1) lib/active_record/relation/delegation.rb:70:in `block in decrement_all'
 - activerecord (4.2.7.1) lib/active_record/relation.rb:302:in `scoping'
 - activerecord (4.2.7.1) lib/active_record/relation/delegation.rb:70:in `decrement_all'
 - acts_as_list (0.9.11) lib/acts_as_list/active_record/acts/list.rb:367:in `shuffle_positions_on_intermediate_items'
 - acts_as_list (0.9.11) lib/acts_as_list/active_record/acts/list.rb:418:in `update_positions'

vi-huynh avatar Apr 13 '18 10:04 vi-huynh

@ddengler I just reproduce deadlock on development environment. I config puma worker is 3 and try send multiple request to server. And then get deadlock error.

vi-huynh avatar Apr 13 '18 11:04 vi-huynh

@vi-huynh, if you could do some more digging into this to figure out why, that would be helpful :) Keen to get this fixed :)

brendon avatar Apr 13 '18 21:04 brendon

@vi-huynh makes sense to me as we fixed this by not doing our operations in parallel anymore. We are also using puma but now with workers set to 0 (= 1) and multiple threads within a docker container ( multiple containers within a cluster).

ddengler avatar Apr 16 '18 10:04 ddengler

Thanks @ddengler for offering some more info :) I'm at a loss as to the cause but it's probably to do with table scan's locking rows even if they're not the rows been updated. Could indexing help alleviate the problem?

brendon avatar Apr 23 '18 22:04 brendon

Some of this can be solved by using a lock on the parent record. Like in #254

Are you getting a deadlock with or without this?

swanandp avatar Apr 24 '18 22:04 swanandp

Why are the requests not all ordered by the position column (either all 'asc' or 'desc')? It's not very surprising that there are deadlocks happening: if you want to access several resources, the only scenario where deadlock is guaranteed to not happen is when concurrent requests for the same resources are locked in the same order.

@patleb, I'm happy to look at any solutions you might have to this problem :)

brendon avatar Aug 16 '20 20:08 brendon

@brendon, I realized that it wasn't an ordering problem on the position column, but the id column. Simply put, every request would have to be ordered by id (either all 'asc' or 'desc') when fetching records and then apply the transformation you want on the subquery.

It would solve the problem at the source, but you would have to pretty much rewrite the whole gem to make this happen... which might not be worth the effort.

patleb avatar Aug 17 '20 02:08 patleb

Yea it's been an annoying problem for a long time. I don't have the time to rewrite anything, but am always happy to review PR's :D

brendon avatar Aug 17 '20 04:08 brendon