zebra icon indicating copy to clipboard operation
zebra copied to clipboard

fix(rpc): Isolate RPC queries from the rest of Zebra, to improve performance

Open teor2345 opened this issue 1 year ago • 9 comments

Motivation

We want to use a separate tokio executor for RPCs, so that RPC queries don't slow down syncing. (And syncing doesn't slow down RPCs.)

API Reference

This is the current behaviour of the RPC server:

Utilize existing event loop executor to poll RPC results.

https://docs.rs/jsonrpc-http-server/18.0.0/jsonrpc_http_server/struct.ServerBuilder.html#method.event_loop_executor

But if we don't supply an executor, all threads will spawn a new runtime:

The first thread will use provided Executor instance and all other threads will use UninitializedExecutor to spawn a new runtime for futures.

https://docs.rs/jsonrpc-http-server/18.0.0/jsonrpc_http_server/struct.ServerBuilder.html#method.threads

Solution

  • Stop sharing the tokio executor with RPCs
  • Move RPC server startup and shutdown into a std::thread, so the RPC executor can block while it shuts down
  • Add tests for RPC startup and port conflicts

Review

This is a routine performance/reliability fix, it can be reviewed whenever we have time.

Reviewer Checklist

  • [ ] lightwalletd and RPC tests pass

teor2345 avatar Jul 22 '22 04:07 teor2345

Codecov Report

Merging #4806 (0f4f657) into main (c322533) will increase coverage by 0.08%. The diff coverage is 100.00%.

:exclamation: Current head 0f4f657 differs from pull request most recent head 8e6f65d. Consider uploading reports for the commit 8e6f65d to get more accurate results

@@            Coverage Diff             @@
##             main    #4806      +/-   ##
==========================================
+ Coverage   79.20%   79.29%   +0.08%     
==========================================
  Files         310      310              
  Lines       38883    38900      +17     
==========================================
+ Hits        30796    30844      +48     
+ Misses       8087     8056      -31     

codecov[bot] avatar Jul 22 '22 05:07 codecov[bot]

This doesn't seem to help (yet!), and it needs a test fix for a concurrency bug.

Let's re-test after we've merged some other fixes.

teor2345 avatar Jul 22 '22 20:07 teor2345

It looks like this change might let different Zebra instances share the same RPC port, which isn't what we want.

Reducing the number of threads to 1 might avoid that, I'll have to check.

It's also possible that it's a spurious test failure.

teor2345 avatar Aug 01 '22 02:08 teor2345

@mergifyio update

teor2345 avatar Aug 03 '22 22:08 teor2345

Let's see if this works on top of the latest fixes.

teor2345 avatar Aug 03 '22 22:08 teor2345

update

✅ Branch has been successfully updated

mergify[bot] avatar Aug 03 '22 22:08 mergify[bot]

@mergifyio update

teor2345 avatar Aug 04 '22 16:08 teor2345

update

✅ Branch has been successfully updated

mergify[bot] avatar Aug 04 '22 16:08 mergify[bot]

When the RPC server spawn() fails, it drops the tokio executor it has just created. But that needs to happen in a non-async context, so the server can block until all tasks finish.

tokio doesn't allow executors to be dropped inside spawn_blocking(), even though it allows blocking tasks. So we might have to escape from spawn_blocking() using a std::thread. 😞

teor2345 avatar Aug 05 '22 00:08 teor2345

CI is failing due to slow syncs, so existing performance fixes are a high priority.

teor2345 avatar Aug 28 '22 22:08 teor2345

@mergifyio update

teor2345 avatar Aug 31 '22 02:08 teor2345

update

✅ Branch has been successfully updated

mergify[bot] avatar Aug 31 '22 02:08 mergify[bot]