chewy icon indicating copy to clipboard operation
chewy copied to clipboard

Danger: bulk strategies can fail to index if called from within a transaction with `use_after_commit_callbacks`

Open bjeanes opened this issue 7 years ago • 5 comments

class Chewy::Strategy::Demo < Chewy::Strategy::Atomic
  def update(type, objects, options = {})
    puts "called update for #{type} with #{objects.size} objects}"
    super
  end

  def leave
    puts "leaving #{self.class} strategy, indexing #{@stash.values.flatten.size}"
    super
  end
end

# Make sure only root strategy is something which doesn't index, to demonstrate:
Chewy.strategy.instance_variable_set(:@stack, [Chewy::Strategy::Base.new])

ActiveRecord::Base.transaction do
  Chewy.strategy(:demo) do
    ActiveRecord::Base.silence { Record.first.save! }
  end
end

This will output:

   (0.1ms)  BEGIN [sql_query]
DEBUG: Chewy strategies stack: [2] <- demo @ (pry):58
DEBUG: Chewy strategies stack: [2] -> demo @ (pry):58
leaving Chewy::Strategy::Demo strategy, indexing 0
   (1.7ms)  COMMIT [sql_query]
#<Chewy::UndefinedUpdateStrategy: Index update strategy is undefined in current context.
Please wrap your code with `Chewy.strategy(:strategy_name) block.`

This is highly problematic because for any given method which specifies a strategy, its caller may have started a transaction, which can break the indexing semantics or worse fail to index something altogether.

bjeanes avatar Dec 14 '17 01:12 bjeanes

Hm, this is actually something worth a discussion. First of all, why can't you wrap a transaction with a strategy?

pyromaniac avatar Dec 14 '17 05:12 pyromaniac

First of all, thank you for Chewy. It's for the most part been a delight.

First of all, why can't you wrap a transaction with a strategy?

Because different strategies have different semantics and the best place for deciding the semantics that should apply to a given component is probably in or close to that component, not its callers. Otherwise, what's the point of having nestable strategies at all?

In our case, we have a large background import which uses staged transactions. I just added Chewy and ES to our codebase and indexed ~70mil records quite happily. However, I was surprised to find that new data was not being imported to ES and nor were there any exceptions. Chewy ate the exceptions and dropped the index request, because the code looked like:

transaction do
  Chewy.strategy(:a_custom_async_strategy_which_I_hope_to_opensource) do
    # ...
  end
end

I then audited the code and realised a few places where this was problematic

Perhaps the current behaviour is correct or at least from an ergonomics/API perspective, you may wish to keep things as they are. Nonetheless, I found this behaviour surprising (and worse, kinda dangerous because of the exceptions being eaten) and thought others may also run into this so I thought I should open an issue for awareness.

Perhaps there needs to be some kind of documentation about this caveat and a note that errors in after_commit hooks are buried in Rails (depending on your version of Rails, and how it is configured).

In my particular case, the best thing to do is actually (surprisingly) to set Chewy.use_after_commit_callbacks = false, because my custom strategies[1] queue via the DB and will: a) be rolled back with any parent transaction anyway; b) only be committed if the changed data is committed[2]; and c) tolerate enqueued items that were not committed or were subsequently deleted (e.g. if the block leaves the strategy outside of any transactions).

If I were to explore this more deeply, I would probably separate "atomic" strategies from "incremental" strategies. Specifically:

  • If current strategy is incremental, update the strategy after_commit
  • If current strategy is atomic, store the current strategy as-of the after_save event, and update strategy that was active at the time the record was saved. Alternatively, for atomic strategies, update it during after_save but delay leave until the transaction is committing (tricky, but not impossible).

FWIW, I think part of the issue here is the name :atomic. I'd consider naming this :batch which is clearer in purpose. A :transactional or :atomic strategy might have some stronger guarantees about DB transaction interop.

Sorry, this turned into a bit of a ramble. Let me know your thoughts or if you have any questions.


[1]: One strategy enqueues items to database as soon as update is called, the other enqueues the batch to the DB when leave is called (but this will be before receiving the items, when use_after_commit_callbacks == true [2]: Incidentally, this is why I prefer to have my strategy inside the transaction and wrote it like that only to discover this implication.

bjeanes avatar Dec 14 '17 06:12 bjeanes

I'd consider naming this :batch which is clearer in purpose

Great suggestion.

Overall, I need to dig into this. Going to investigate it further and see what can I do with it. Right now I so busy with other stuff. Anyway, thanks for the report and your suggestions, I'll definitely consider it.

pyromaniac avatar Dec 15 '17 05:12 pyromaniac

No worries. No rush on my end, as I have a custom strategy which actually works better with after_commit callbacks off. I just wanted others to know. Nonetheless, I'd be happy to help in any way if you need it.

bjeanes avatar Dec 15 '17 06:12 bjeanes

Any update on this?

glyoko avatar Apr 07 '20 18:04 glyoko