librustzcash icon indicating copy to clipboard operation
librustzcash copied to clipboard

Change the cache DB format for `zcash_client_sqlite`

Open str4d opened this issue 2 years ago • 2 comments

Currently we store compact blocks in rows of a SQLite table (specifically (height, blockbytes)), in a database specifically for caching. @ccjernigan noted that this is quite inefficient when the compact blocks become large, because SQLite is not good at data reclamation. We can "easily" reclaim that data by just deleting the database file, but this becomes more complex on e.g. Android (which uses Room for managing its SQLite databases).

In a meeting today with @ccjernigan, @pacu, @nuttycom and myself, we decided to define a new cache DB system. Compact blocks will be written to a known on-disk path with a known filename format HEIGHT-BLOCKHASHHEX, and then metadata about the block (which is much smaller) is what gets stored in the SQLite database. We specifically want the following metadata:

  • Height
  • Block hash
  • Time
  • number of Sapling outputs
  • number of Orchard outputs
    • These two are for figuring out how blocks line up against subtree boundaries in the note commitment trees.

The SQLite cache DB will be managed on the Rust side instead of the SDK side. The data flow will be:

  • When the cache is being filled:
    • SDK requests via FFI the current height at which cached data is stored.
      • If no cached data, SDK can request the wallet's synced-to height instead.
    • SDK rounds the height down to some 100- / 1000-block boundary (for privacy).
    • SDK calls the getblock gRPC method and gets a stream of compact blocks.
    • SDK writes each compact block to the defined filesystem folder as a separate file.
      • Overwrites are fine because we assume block hashes are both immutable and do not collide.
    • Once "enough" blocks have been written to disk successfully, SDK provides the list of corresponding metadata to Rust via the FFI.
    • Rust updates the cache DB with this metadata atomically.
  • When Rust is later told to scan the cache for wallet data:
    • It reads the cache DB to obtain (height, hash).
    • It reads the corresponding file height-hash from the defined filesystem folder and parses the compact block.
    • It proceeds with scanning the compact block as usual.
  • When part of the cache is no longer needed:
    • SDK determines some height H that is the earliest block data it needs to preserve.
      • This might be determined based on where the wallet is fully-synced to, or other heuristics.
    • SDK searches the defined filesystem folder for all files beginning in HEIGHT-* where HEIGHT < H, and deletes those files.

This strategy can be extended to caching other data, such as a "sub-compact block" that only holds nullifier data (zcash/lightwalletd#406); we'd store that data in e.g. HEIGHT-BLOCKHASHHEX.nullifiers. This duplicates data with the compact block files, but that is only short-lived duplication as eventually all data in the cache will be deleted (as all data is prefixed with HEIGHT-*).

str4d avatar Jul 28 '22 16:07 str4d

Given how @nuttycom implemented the cache DB traits, this should just be a case of writing a new trait implementation, and then the downstream consumers of zcash_client_sqlite (the SDKs) can decide precisely when to switch from the old-style cache to the new-style cache.

str4d avatar Jul 28 '22 16:07 str4d

I have a few very minor suggestions on the file naming conventions:

  • Consider - as the separator in all places. E.g. HEIGHT-BLOCKHASHHEX.nullifiers -> HEIGHT-BLOCKHASHHEX-nullifiers. This makes parsing the filename easier in most languages, e.g. in Java it would be a single String.split("-") and it doesn't look like a filename extension
  • Consider a suffix for all types, e.g. ``HEIGHT-BLOCKHASHHEX->HEIGHT-BLOCKHASHHEX-block(orHEIGHT-BLOCKHASHHEX-compactblock`) so that all types have a consistently sized 3 element array split on the filename

Although I expect that we'll be going from SQL queries to constructing the filenames, there might be some times in the future where we want to scan the directory filenames directly (e.g. in automated tests to validate we didn't leak files). I think the suggestions above would make that easier.

ccjernigan avatar Jul 29 '22 17:07 ccjernigan

Was VACUUM considered as a solution to the problem of sqlite db files not shrinking?

One file per block is extremely slow on Windows. Windows doesn't handle directories with millions of files very well at all. I'm preparing a change to allow a trait to be supplied by the consumer to override the default file-per-block policy. Maybe 10K blocks per file will strike a good balance.

AArnott avatar Dec 16 '23 21:12 AArnott

The rationale for file-per-block was simplicity, on the basis that you shouldn't have millions of files in a single directory. You'd maybe have thousands (my Rust CLI wallet fetches batches of 10k blocks), but as this is a cache and not a block store those files are transient and regularly deleted as soon as they've been scanned. If a wallet's bottleneck is trial decryption, it shouldn't be downloading blocks indefinitely far ahead of what it has scanned.

str4d avatar Dec 16 '23 23:12 str4d

Was VACUUM considered as a solution to the problem of sqlite db files not shrinking?

Well... to be honest, VACUUM somehow is the genesis of the problem. We used to place everything on cache_db, but it grew up fast when going through full blocks, it was super slow and impossible to parallelize.

We've analyzed the situation, read the documentation of VACUUM, PoC'd it ends up being that the operation is not only time consuming, and requiring twice the db's file size to be performed, but also it does not guarantee freeing up any space at all. So we decided to remove blobs from sqlite and use a file system cache to store the blocks and the db to store records.

I'm preparing a change to allow a trait to be supplied by the consumer to override the default file-per-block policy. Maybe 10K blocks per file will strike a good balance.

This is a good idea, since it's an OS limitation that you are reporting. I would make a the change a bit different:

Do a refactor that allows keep everything as it is for the rest of the OS, and then implement the windows requirements as you need.

This would be a refactoring pattern like:

  1. abstract the interface
  2. move the existing code to one possible implementation
  3. create methods that allow callers to choose the implementation (rust does not have default parameters so it probably will need to be explicit)
  4. fix broken tests
  5. implement the windows variant
  6. add tests for the new implementation
  7. maintain or enhance code coverage

pacu avatar Dec 17 '23 13:12 pacu

Thanks for the feedback. What you describe as the refactoring pattern is pretty close to what I ended up doing. I haven't written the new implementation yet, but I have a branch where the 1-file per block behavior is now the default implementation that can now be swapped out.

AArnott avatar Dec 18 '23 02:12 AArnott

Closing as fixed by #596 - if there is additional work to batch blocks on disk differently, a separate issue should be opened for it.

nuttycom avatar Mar 01 '24 00:03 nuttycom