lightwalletd icon indicating copy to clipboard operation
lightwalletd copied to clipboard

Reduce gRPC startup time

Open emersonian opened this issue 10 months ago • 2 comments

Lightwalletd can take upwards of 20 minutes to start when a block cache exists on a magnetic hard drive. When lightwalletd starts up for the first time, it is able to respond to gRPC nearly immediately while the block cache synchronizes in the background. It is confusing that upon second launch the time-to-first-listen is very slow.

This pull request contains two changes:

  1. Load and maintain synchronization of the block cache in the background without blocking gRPC's first listen
  2. Only announce that gRPC is listening once it is actually listening

This is a draft open to community review. I will test this in a staging environment for a few weeks before requesting a merge.

A Docker Hub image of this branch is available for testing at: emersonian/zcash-lwd:v0.4.17-cache1


Please ensure this checklist is followed for any pull requests for this repo. This checklist must be checked by both the PR creator and by anyone who reviews the PR.

  • [x] Relevant documentation for this PR has to be completed before the PR can be merged
  • [ ] A test plan for the PR must be documented in the PR notes and included in the test plan for the next regular release

As a note, all CI tests need to be passing and all appropriate code reviews need to be done before this PR can be merged

emersonian avatar Apr 24 '24 18:04 emersonian

This improvement has been stable for me on a few test severs, I consider it ready to review and merge. Thanks.

emersonian avatar Apr 29 '24 19:04 emersonian

There's a better way to fix this, I believe. Can you check the startup performance when using this patch (applied to master):

+++ a/common/cache.go
+++ b/common/cache.go
@@ -244,12 +244,6 @@ func NewBlockCache(dbPath string, chainName string, startHeight int, syncFromHei
                offset += int64(length) + 8
                c.starts = append(c.starts, offset)
-               // Check for corruption.
-               block := c.readBlock(c.nextBlock)
-               if block == nil {
-                       Log.Warning("error reading block")
-                       c.recoverFromCorruption(c.nextBlock)
-                       break
-               }
                c.nextBlock++
        }
        c.setDbFiles

In other words, remove the deserialization and checking of each block when reading the blocks disk cache during startup. I added this check as part of #186 (2022). I remember it adding only a couple of seconds to the startup time. Either I measured incorrectly, or it's taking much longer to parse blocks, or both. This was before the sandblasting attack (creating extremely large blocks), so that most likely is the cause of the startup time increase.

When I run this patch, the log messages that say "Reading nnnnn blocks from disk cache" and "Done reading" are within the same second (as shown by their timestamps), so this should be fast enough. But please test this and let me know what you find.

This check isn't really needed because it detects corruption in the blocks file, which probably never happens. If it does get corrupted, the length fields (before each individual block) will likely be corrupted as well (the sanity check for that kind of corruption remains). The functional effect of removing this check is that if there is corruption in a serialized block, it won't be detected during startup, but is detected when we later try to read the block using a gRPC like GetBlock. That's perfectly acceptable.

LarryRuane avatar May 02 '24 17:05 LarryRuane

Closing this now because I've merged #484 which I think is a better fix. Please reopen (or open an issue) if you disagree, thanks.

LarryRuane avatar Jun 13 '24 15:06 LarryRuane

Okay thanks for the startup speed improvement. Can you also merge the change from this pull request's diff that moves "Starting gRPC server version" to the line where the gRPC server is actually starting? The logging is confusing and inaccurate as-is, thanks!

emersonian avatar Jun 19 '24 21:06 emersonian

Let's reopen this, and if you don't mind, can you update the PR accordingly? Thanks! (Or, if you'd prefer, I can make a new PR.)

LarryRuane avatar Jun 20 '24 16:06 LarryRuane

Force pushed to reduce this PR to just the second commit (correct logging)

LarryRuane avatar Jun 27 '24 19:06 LarryRuane

@emersonian - Could you review this PR now? It should take only a minute. Thanks.

LarryRuane avatar Jul 08 '24 19:07 LarryRuane

It looks good to me, thanks! When should we anticipate the next release?

emersonian avatar Jul 09 '24 23:07 emersonian

@emersonian - I changed it slightly; can you review again? I think it's good to log the version and build information early (like we've always done), but just don't say that the gRPC server is starting. Then later, when we do start it, log another message saying that. Some things happen before starting the gRPC server, such as trying to execute an RPC to the backend zcashd (or zebrad), and if that fails, it would be good to know (if someone ships me a log file) what version and build they were running. Here's what the log file looks like now (with the latest force-push) if I start lightwalletd while zcashd is still starting up (note the now-separated buildDate and Starting gRPC server log lines):

$ ./lightwalletd --zcash-conf-path ~/.zcash/zcash.conf --no-tls-very-insecure --log-file /dev/stdout
{"app":"lightwalletd","buildDate":"2024-07-10","buildUser":"larry","gitCommit":"6d801b8c8f78d05d8c3c14a453454d3c6c640f78","level":"info","msg":"Starting lightwalletd process version v0.4.17-6-g6d801b8","time":"2024-07-10T11:36:37-06:00"}
{"app":"lightwalletd","level":"warning","msg":"Starting insecure no-TLS (plaintext) server","time":"2024-07-10T11:36:37-06:00"}
Starting insecure server
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":1,"time":"2024-07-10T11:36:37-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":2,"time":"2024-07-10T11:36:52-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":3,"time":"2024-07-10T11:37:12-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":4,"time":"2024-07-10T11:37:37-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":5,"time":"2024-07-10T11:38:07-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":6,"time":"2024-07-10T11:38:42-06:00"}
{"app":"lightwalletd","error":"-28: Loading block index...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":7,"time":"2024-07-10T11:39:22-06:00"}
{"app":"lightwalletd","error":"-28: Rewinding blocks if needed...","level":"warning","msg":"error with getblockchaininfo rpc, retrying...","retry":8,"time":"2024-07-10T11:40:07-06:00"}
{"app":"lightwalletd","level":"warning","msg":"getblockchaininfo RPC successful","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Got sapling height 419200 block height 2566379 chain main branchID c2d6d0b4","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Reading 0 blocks (since Sapling activation) from disk cache ...","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Done reading 0 blocks from disk cache","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Starting gRPC server on 127.0.0.1:9067","time":"2024-07-10T11:40:57-06:00"}
{"app":"lightwalletd","level":"info","msg":"Adding block to cache 419369 0000000002959f10f2df11187a7316a977a256e0e2e5c73dca90431120efdede","time":"2024-07-10T11:41:01-06:00"}
{"app":"lightwalletd","level":"info","msg":"Adding block to cache 419817 0000000001513d6001d2ebadd058251889fb35125bc0b79b9bf7d5352127bcf3","time":"2024-07-10T11:41:05-06:00"}

As for another release, I think we should do one right away with this and a few other good changes we've had recently.

LarryRuane avatar Jul 10 '24 18:07 LarryRuane