solidus
solidus copied to clipboard
Should Spree::Product still after_touch Spree::Taxon? Might be causing Deadlocks.
Solidus V2.10.1
Why do products touch taxons when touched?
https://github.com/solidusio/solidus/blob/b0092d0f3bda447494a586bf8eb96a18322b8bed/core/app/models/spree/product.rb#L348-L357
I noticed that in our deployment where we have multiple operators simultaneously reviewing products as well as async jobs creating products (many of which share the same Spree::Taxon
) this operation seems to be related to (or causing) some of our DB deadlocks.
PG::TRDeadlockDetected: ERROR: deadlock detected
DETAIL: Process 18333 waits for ShareLock on transaction 26056034; blocked by process 29223.
Process 29223 waits for ShareLock on transaction 26056296; blocked by process 18333.
HINT: See server log for query details.
CONTEXT: while rechecking updated tuple (17,44) in relation "spree_taxons"
I am considering removing this operation on our end. However, if this is legacy would it be appropriate to remove it from here as well?
A similar problem is occurring with the touching of Spree::OptionType.
This is presumably for caching reasons, but you're definitely right. We should probably be scaling back these kinds of things as for most stores they may want to have their own cache invalidation strategy and touching inside callbacks doesn't present these kinds of deadlock issues since they run in transactions.
Coincidentally, just realized we encountered this issue the other day. Didn't immediately occur to me what it was. Thanks for reporting!
I'm ok with trying to get rid of this custom code but I'm not sure to get the solution you are proposing here. As far as I understand we need to touch all the taxons (and all their children) associated with the product every time we touch it. This touch_taxons
is happening inside the after_touch
callback so maybe it's already doing things to prevent deadlocks?
Also, I was quickly exploring the Rails codebase looking for some hints and I found this touch_later
https://github.com/rails/rails/pull/19324 that could be probably something worth investigating.
I'm seeing deadlocks caused by this method as well. I don't have the full context, but I believe there are parallel processes that end up calling this method and when the taxons are similar, deadlocks can happen since ordering is not enforced with update_all
, at least on postgres.
Following up @spaghetticode's comment: a refactor to call #update_column
on individual taxons will avoid the non-deterministic behavior of executing multiple #update_all
's in a parallelized context.
We're using a version of the following solution in production and it's proved to be effective in eliminating unwanted deadlocks. Note: our project went the extra mile to refactor and order the associated Spree::Taxonomy
's since the method issues a second #update_all
and is similarly prone to the same behavior. The code below should maintain performance parity with the current implementation i.e. the same number of queries are used for fetching and updating:
def touch_taxons
- taxons_to_touch = taxons.flat_map(&:self_and_ancestors).uniq
+ taxons_to_touch = Spree::Taxon.none
+
+ taxons.each do |taxon|
+ taxons_to_touch = taxons_to_touch.or(taxon.self_and_ancestors)
+ end
+
+ taxons_to_touch = taxons_to_touch.order(:id)
+
unless taxons_to_touch.empty?
- Spree::Taxon.where(id: taxons_to_touch.map(&:id)).update_all(updated_at: Time.current)
+ taxons_to_touch.each { |taxon| taxon.update_column(:updated_at, Time.current) }
+
taxonomy_ids_to_touch = taxons_to_touch.flat_map(&:taxonomy_id).uniq
Spree::Taxonomy.where(id: taxonomy_ids_to_touch).update_all(updated_at: Time.current)
end
end
Should we even be doing this in a transaction? Is this only happening because touch_taxons
is happening in an after_save
/after_touch
? Could we switch to using after_commit
instead, so that it's not wrapped in a transaction?
@pacarvalho can you try the approach from #4239 and check if this fixes your dead locks?
@pacarvalho can you try the approach from #4239 and check if this fixes your dead locks?
It has been a while since I looked into this. I believe I just turned off this functionality since it did not affect the functionality we needed for this particular project. However, the proposed solution seems (just by quickly looking over the PR) like it would not cause deadlocks.