khepri
khepri copied to clipboard
khepri_machine: Run the query anonymous function from the calling process
Why
Before this change, the anonymous function passed to khepri_machine:fold/5
was executed by the Ra server process. This had two downsides:
- it blocked the Ra server to serve writes
- the anonymous function couldn't query the Ra server itself because it would deadlock
How
The alternative implemented in this patch is to get the whole Khepri tree from the Ra server, then run the anonymous function, this time from the calling process.
The downside is that the whole Khepri tree is copied to that calling process.
Codecov Report
Attention: 3 lines
in your changes are missing coverage. Please review.
Comparison is base (
2a20943
) 88.89% compared to head (ac306e2
) 88.90%. Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #226 +/- ##
==========================================
+ Coverage 88.89% 88.90% +0.01%
==========================================
Files 20 20
Lines 2890 2902 +12
==========================================
+ Hits 2569 2580 +11
- Misses 321 322 +1
Flag | Coverage Δ | |
---|---|---|
erlang-24 | 87.87% <75.00%> (+0.05%) |
:arrow_up: |
erlang-25 | 87.76% <75.00%> (-0.06%) |
:arrow_down: |
erlang-26 | 88.52% <75.00%> (+0.01%) |
:arrow_up: |
os-ubuntu-latest | 88.83% <75.00%> (-0.06%) |
:arrow_down: |
os-windows-latest | 88.59% <75.00%> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Files | Coverage Δ | |
---|---|---|
src/khepri.erl | 90.57% <ø> (ø) |
|
src/khepri_machine.erl | 94.73% <75.00%> (-0.56%) |
:arrow_down: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I'm still not sure this alternative approach is what we want. Copying the entire Khepri tree to the calling process could be very expensive and the memory footprint could sky-rocket.
Another solution would be to try to run the anonymous function from the Ra process like before. But if it exits with a calling_self
exception, then we could re-run it from the calling process. But then, the anonymous function must have no side effects, or the side effects should be idempotent. Perhaps we could accept a {can_be_restarted, true}
option, so that the caller lets Khepri know it can re-run the anonymous function.
Or simply, we accept a {run_fun_from_calling_process, true}
, it's probably simpler and easier to reason about. This needs a better name though.
What do you think, @dcorbacho and @the-mikedavis?
To give some context to this issue, when RabbitMQ queries transient queues from the database, it queries the Khepri members from a Khepri query anonymous function:
khepri:fold(
StoreId,
PathPattern,
fun(QueuePath, NodeProps) ->
...
%% This queries the Ra server and can't be executed by the Ra server
%% itself because it would deadlock. gen_statem throws a `calling_self`
%% exception.
Members = khepri:members(StoreId),
...
end,
[]).
I pushed an additional commit to implement that option that lets the caller decide if it wants the tree to be copied to be able to do nested queries.
I would love some feedback on the approach and the option name :-)
We ended up going a different direction on this in the server similar to https://github.com/the-mikedavis/rabbitmq-server/commit/aed8ddd80e35d97b1afeb277f057209d845ef548. We use khepri:get_many/3
to get all of the records under the path and then fold separately with maps:fold/3
so that the fun isn't called within the Ra server process.
@dumbbell should we close out this PR for now? We could re-open it if we find another use-case and want to make it possible to do while still calling khepri:fold/5