khepri icon indicating copy to clipboard operation
khepri copied to clipboard

khepri_machine: Run the query anonymous function from the calling process

Open dumbbell opened this issue 1 year ago • 4 comments

Why

Before this change, the anonymous function passed to khepri_machine:fold/5 was executed by the Ra server process. This had two downsides:

  • it blocked the Ra server to serve writes
  • the anonymous function couldn't query the Ra server itself because it would deadlock

How

The alternative implemented in this patch is to get the whole Khepri tree from the Ra server, then run the anonymous function, this time from the calling process.

The downside is that the whole Khepri tree is copied to that calling process.

dumbbell avatar Sep 27 '23 07:09 dumbbell

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (2a20943) 88.89% compared to head (ac306e2) 88.90%. Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #226      +/-   ##
==========================================
+ Coverage   88.89%   88.90%   +0.01%     
==========================================
  Files          20       20              
  Lines        2890     2902      +12     
==========================================
+ Hits         2569     2580      +11     
- Misses        321      322       +1     
Flag Coverage Δ
erlang-24 87.87% <75.00%> (+0.05%) :arrow_up:
erlang-25 87.76% <75.00%> (-0.06%) :arrow_down:
erlang-26 88.52% <75.00%> (+0.01%) :arrow_up:
os-ubuntu-latest 88.83% <75.00%> (-0.06%) :arrow_down:
os-windows-latest 88.59% <75.00%> (-0.03%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/khepri.erl 90.57% <ø> (ø)
src/khepri_machine.erl 94.73% <75.00%> (-0.56%) :arrow_down:

... and 2 files with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Sep 27 '23 07:09 codecov[bot]

I'm still not sure this alternative approach is what we want. Copying the entire Khepri tree to the calling process could be very expensive and the memory footprint could sky-rocket.

Another solution would be to try to run the anonymous function from the Ra process like before. But if it exits with a calling_self exception, then we could re-run it from the calling process. But then, the anonymous function must have no side effects, or the side effects should be idempotent. Perhaps we could accept a {can_be_restarted, true} option, so that the caller lets Khepri know it can re-run the anonymous function.

Or simply, we accept a {run_fun_from_calling_process, true} , it's probably simpler and easier to reason about. This needs a better name though.

What do you think, @dcorbacho and @the-mikedavis?

To give some context to this issue, when RabbitMQ queries transient queues from the database, it queries the Khepri members from a Khepri query anonymous function:

khepri:fold(
  StoreId,
  PathPattern,
  fun(QueuePath, NodeProps) ->
      ...
      %% This queries the Ra server and can't be executed by the Ra server
      %% itself because it would deadlock. gen_statem throws a `calling_self`
      %% exception.
      Members = khepri:members(StoreId),
      ...
  end,
  []).

dumbbell avatar Sep 27 '23 09:09 dumbbell

I pushed an additional commit to implement that option that lets the caller decide if it wants the tree to be copied to be able to do nested queries.

I would love some feedback on the approach and the option name :-)

dumbbell avatar Sep 27 '23 10:09 dumbbell

We ended up going a different direction on this in the server similar to https://github.com/the-mikedavis/rabbitmq-server/commit/aed8ddd80e35d97b1afeb277f057209d845ef548. We use khepri:get_many/3 to get all of the records under the path and then fold separately with maps:fold/3 so that the fun isn't called within the Ra server process.

@dumbbell should we close out this PR for now? We could re-open it if we find another use-case and want to make it possible to do while still calling khepri:fold/5

the-mikedavis avatar Sep 29 '23 19:09 the-mikedavis