solidus icon indicating copy to clipboard operation
solidus copied to clipboard

Should Spree::Product still after_touch Spree::Taxon? Might be causing Deadlocks.

Open pacarvalho opened this issue 4 years ago • 8 comments

Solidus V2.10.1

Why do products touch taxons when touched?

https://github.com/solidusio/solidus/blob/b0092d0f3bda447494a586bf8eb96a18322b8bed/core/app/models/spree/product.rb#L348-L357

I noticed that in our deployment where we have multiple operators simultaneously reviewing products as well as async jobs creating products (many of which share the same Spree::Taxon) this operation seems to be related to (or causing) some of our DB deadlocks.

PG::TRDeadlockDetected: ERROR:  deadlock detected
DETAIL:  Process 18333 waits for ShareLock on transaction 26056034; blocked by process 29223.
Process 29223 waits for ShareLock on transaction 26056296; blocked by process 18333.
HINT:  See server log for query details.
CONTEXT:  while rechecking updated tuple (17,44) in relation "spree_taxons"

I am considering removing this operation on our end. However, if this is legacy would it be appropriate to remove it from here as well?

A similar problem is occurring with the touching of Spree::OptionType.

pacarvalho avatar Feb 12 '21 01:02 pacarvalho

This is presumably for caching reasons, but you're definitely right. We should probably be scaling back these kinds of things as for most stores they may want to have their own cache invalidation strategy and touching inside callbacks doesn't present these kinds of deadlock issues since they run in transactions.

jarednorman avatar Feb 14 '21 21:02 jarednorman

Coincidentally, just realized we encountered this issue the other day. Didn't immediately occur to me what it was. Thanks for reporting!

jarednorman avatar Feb 15 '21 02:02 jarednorman

I'm ok with trying to get rid of this custom code but I'm not sure to get the solution you are proposing here. As far as I understand we need to touch all the taxons (and all their children) associated with the product every time we touch it. This touch_taxons is happening inside the after_touch callback so maybe it's already doing things to prevent deadlocks?

Also, I was quickly exploring the Rails codebase looking for some hints and I found this touch_later https://github.com/rails/rails/pull/19324 that could be probably something worth investigating.

kennyadsl avatar Feb 15 '21 23:02 kennyadsl

I'm seeing deadlocks caused by this method as well. I don't have the full context, but I believe there are parallel processes that end up calling this method and when the taxons are similar, deadlocks can happen since ordering is not enforced with update_all, at least on postgres.

spaghetticode avatar Jul 07 '21 15:07 spaghetticode

Following up @spaghetticode's comment: a refactor to call #update_column on individual taxons will avoid the non-deterministic behavior of executing multiple #update_all's in a parallelized context.

We're using a version of the following solution in production and it's proved to be effective in eliminating unwanted deadlocks. Note: our project went the extra mile to refactor and order the associated Spree::Taxonomy's since the method issues a second #update_all and is similarly prone to the same behavior. The code below should maintain performance parity with the current implementation i.e. the same number of queries are used for fetching and updating:

def touch_taxons
- taxons_to_touch = taxons.flat_map(&:self_and_ancestors).uniq
+ taxons_to_touch = Spree::Taxon.none
+
+ taxons.each do |taxon|
+   taxons_to_touch = taxons_to_touch.or(taxon.self_and_ancestors)
+ end
+
+ taxons_to_touch = taxons_to_touch.order(:id)
+
  unless taxons_to_touch.empty?
-   Spree::Taxon.where(id: taxons_to_touch.map(&:id)).update_all(updated_at: Time.current)
+   taxons_to_touch.each { |taxon| taxon.update_column(:updated_at, Time.current) }
+
    taxonomy_ids_to_touch = taxons_to_touch.flat_map(&:taxonomy_id).uniq
    Spree::Taxonomy.where(id: taxonomy_ids_to_touch).update_all(updated_at: Time.current)
  end
end

nohmar avatar Jul 19 '21 18:07 nohmar

Should we even be doing this in a transaction? Is this only happening because touch_taxons is happening in an after_save/after_touch? Could we switch to using after_commit instead, so that it's not wrapped in a transaction?

jarednorman avatar Jul 19 '21 20:07 jarednorman

@pacarvalho can you try the approach from #4239 and check if this fixes your dead locks?

tvdeyen avatar Jan 07 '22 19:01 tvdeyen

@pacarvalho can you try the approach from #4239 and check if this fixes your dead locks?

It has been a while since I looked into this. I believe I just turned off this functionality since it did not affect the functionality we needed for this particular project. However, the proposed solution seems (just by quickly looking over the PR) like it would not cause deadlocks.

pacarvalho avatar Jan 08 '22 15:01 pacarvalho