zebra
zebra copied to clipboard
fix(rpc): Isolate RPC queries from the rest of Zebra, to improve performance
Motivation
We want to use a separate tokio executor for RPCs, so that RPC queries don't slow down syncing. (And syncing doesn't slow down RPCs.)
API Reference
This is the current behaviour of the RPC server:
Utilize existing event loop executor to poll RPC results.
https://docs.rs/jsonrpc-http-server/18.0.0/jsonrpc_http_server/struct.ServerBuilder.html#method.event_loop_executor
But if we don't supply an executor, all threads will spawn a new runtime:
The first thread will use provided Executor instance and all other threads will use UninitializedExecutor to spawn a new runtime for futures.
https://docs.rs/jsonrpc-http-server/18.0.0/jsonrpc_http_server/struct.ServerBuilder.html#method.threads
Solution
- Stop sharing the tokio executor with RPCs
- Move RPC server startup and shutdown into a
std::thread
, so the RPC executor can block while it shuts down - Add tests for RPC startup and port conflicts
Review
This is a routine performance/reliability fix, it can be reviewed whenever we have time.
Reviewer Checklist
- [ ]
lightwalletd
and RPC tests pass
Codecov Report
Merging #4806 (0f4f657) into main (c322533) will increase coverage by
0.08%
. The diff coverage is100.00%
.
:exclamation: Current head 0f4f657 differs from pull request most recent head 8e6f65d. Consider uploading reports for the commit 8e6f65d to get more accurate results
@@ Coverage Diff @@
## main #4806 +/- ##
==========================================
+ Coverage 79.20% 79.29% +0.08%
==========================================
Files 310 310
Lines 38883 38900 +17
==========================================
+ Hits 30796 30844 +48
+ Misses 8087 8056 -31
This doesn't seem to help (yet!), and it needs a test fix for a concurrency bug.
Let's re-test after we've merged some other fixes.
It looks like this change might let different Zebra instances share the same RPC port, which isn't what we want.
Reducing the number of threads to 1 might avoid that, I'll have to check.
It's also possible that it's a spurious test failure.
@mergifyio update
Let's see if this works on top of the latest fixes.
update
✅ Branch has been successfully updated
@mergifyio update
update
✅ Branch has been successfully updated
When the RPC server spawn() fails, it drops the tokio executor it has just created. But that needs to happen in a non-async context, so the server can block until all tasks finish.
tokio
doesn't allow executors to be dropped inside spawn_blocking(), even though it allows blocking tasks. So we might have to escape from spawn_blocking() using a std::thread. 😞
CI is failing due to slow syncs, so existing performance fixes are a high priority.
@mergifyio update
update