zebra
zebra copied to clipboard
add(state): Track spending transaction ids by spent outpoints and revealed nullifiers
Motivation
We want to lookup transaction ids by their transparent inputs and revealed nullifiers.
Closes #8837, closes #8838.
Solution
- Adds a new
tx_loc_by_spent_out_loccolumn family - Updates revealed nullifier column families to store spending transaction locations as values instead of
() - Stores the
TransactionLocationof spending transactions by spentOutputLocations and nullifiers when writing blocks to the finalized state - Adds the hashes of spending transactions as the values in the
spent_utxosfield on non-finalizedChains - Adds
ReadRequestandReadResponsevariants for querying spending tx ids by outpoint with theReadStateService - Adds a
spending_transaction_hash()read function used to handle the newReadRequest - Updates snapshots
It may be possible to update the tx_loc_by_transparent_addr_loc column instead, but adding a new one seemed easier.
Tests
- Adds a test that checks the last 500 blocks in the finalized and non-finalized state for spending transaction ids (uses a cached state)
- Manually tested the db format upgrade
- Full Mainnet sync test running here
Follow Up Work
- Consider updating
cancel_receiverto a type that implementsSyncand parallelizing the db format upgrade by transaction. - https://github.com/ZcashFoundation/zebra/issues/8922
PR Author's Checklist
- [x] The PR name will make sense to users.
- [ ] The PR provides a CHANGELOG summary.
- [x] The solution is tested.
- [x] The documentation is up to date.
- [x] The PR has a priority label.
PR Reviewer's Checklist
- [ ] The PR Author's checklist is complete.
- [ ] The PR resolves the issue.
Added a do-not-merge label so that this won't be published until after https://github.com/ZcashFoundation/zebra/issues/8922 has been implemented to avoid substantially increasing storage requirements for users that won't be using these indexes.
@mpguerra It looks like it's actually not using much storage space, I was looking at the db metrics printed at startup which were about double the expected storage requirements prior to the change, but the total size of the state cache directory is about the same as it was before, so I think the db metrics are overestimating the total db size.
I checked the number of keys by column family, and by height 2M on Mainnet, it's ~10M transparent outputs and ~150M nullifiers in total, not all of which are spent. It's 10B per spent transparent output and 5 bytes per nullifier, so it should be, at most, ~1GB of additional storage requirements by block 2M. I'll update this comment with the number of nullifiers and transparent outputs at the network chain tip once my node finishes syncing, but it's looking like hiding this behind a feature may have been unnecessary.
~Having the indexes behind a feature still seems nice to have, but there's also unnecessary complexity to be reviewed and maintained around adding/deleting the indexes. Should we keep them behind a feature in this PR or remove the feature?~
Update:
At the current network chain tip, it's about 6.2GB of extra data (5.5gb + 140M * 5B), also it's 14B per spent transparent output, not 10B (I had forgot about the output index).
6.2GB doesn't seem excessive, but we could use the feature later if/when caching blocks in their compact format.
Relevant Column Family Sizes
sprout_nullifiers (Disk: 146.7 MB, Memory: 9.4 MB, num_keys: Some(1663236))
sapling_nullifiers (Disk: 230 MB, Memory: 4.2 MB, num_keys: Some(3068534))
orchard_nullifiers (Disk: 6.3 GB, Memory: 55.1 MB, num_keys: Some(134798732))
tx_loc_by_spent_out_loc (Disk: 5.5 GB, Memory: 6.3 MB, num_keys: Some(316786532))
~The scan_starts_where_left test is failing here: https://github.com/ZcashFoundation/zebra/actions/runs/11513520050/job/32050881919?pr=8895#step:15:615~
~I thought it was because a column family could be dropped earlier, but it happened again after switching to removing a comprehensive range instead of dropping the column family, so now I'm thinking it's because it's trying to open a secondary db before opening the primary db, where opening the primary db will create any missing column families but opening the secondary panics when one is missing because it lacks write access.~
~I'll confirm that manually, if that is the case, I think TrustedChainSync should just log a warning saying "Run Zebra first or downgrade your Zebra version". This PR should also either bump the db format version or hide the new column family behind the indexer feature, in the latter case, that would mean Zebra can't clear the column family once it's been populated, so I'm leaning towards bumping the db format version.~
Test failure is unrelated, should be fixed now.
Test i ommented out here ended up failing locally for me:
2024-11-29T13:34:16.260577Z INFO load_tip_height_from_state_directory{network=Mainnet state_path="/media/alfredo/stuff/chain/zebra"}: checking database format produced by new blocks in this instance is valid running_version=26.0.0+indexer
2024-11-29T13:34:27.993693Z INFO load_tip_height_from_state_directory{network=Mainnet state_path="/media/alfredo/stuff/chain/zebra"}: database format is valid running_version=26.0.0+indexer inital_disk_version=26.0.0+indexer
2024-11-29T13:34:28.287676Z INFO got finalized tip height from state directory finalized_tip_height=2733291 non_finalized_tip_height=2733390 estimated_finalized_tip_height=2733291
/home/alfredo/zebra/pr8895/zebra/target/release/zebrad Child Stderr:
Thank you for running a mainnet zebrad 2.0.1+48.g1ebb1a5 node!
You're helping to strengthen the network and contributing to a social good :)
2024-11-29T13:34:28.295409Z WARN start_state_service_with_cache_dir{network=Mainnet}: could not canonicalize "/media/alfredo/stuff/chain/zebra/state/v25/mainnet": No such file or directory (os error 2)
2024-11-29T13:34:28.295420Z INFO start_state_service_with_cache_dir{network=Mainnet}: trying to open current database format running_version=26.0.0+indexer
2024-11-29T13:34:28.295478Z INFO start_state_service_with_cache_dir{network=Mainnet}: the open file limit is high enough for Zebra current_limit=1024 min_limit=512 ideal_limit=1024
2024-11-29T13:34:28.800416Z INFO start_state_service_with_cache_dir{network=Mainnet}: Opened Zebra state cache at /media/alfredo/stuff/chain/zebra/state/v26/mainnet
2024-11-29T13:34:28.800537Z INFO start_state_service_with_cache_dir{network=Mainnet}: loaded Zebra state cache tip=Some((Height(2733291), block::Hash("0000000000f6ab8efa168e3b3b83fdd22a441379f469cb62204d143d22fe6302")))
2024-11-29T13:34:28.800630Z INFO start_state_service_with_cache_dir{network=Mainnet}: checking database format produced by a previous zebra instance is current and valid running_version=26.0.0+indexer
2024-11-29T13:34:28.800835Z INFO start_state_service_with_cache_dir{network=Mainnet}: started checking/adding indexes for spending tx ids
2024-11-29T13:34:28.802132Z INFO start_state_service_with_cache_dir{network=Mainnet}: starting legacy chain check
2024-11-29T13:34:28.803231Z INFO start_state_service_with_cache_dir{network=Mainnet}: cached state consensus branch is valid: no legacy chain found
2024-11-29T13:34:28.803289Z INFO committing blocks to non-finalized state
The application panicked (crashed).
Message: can call blocking only when running on the multi-threaded runtime
Location: zebra-state/src/service.rs:912
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SPANTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
0: zebra_state::service::state
at zebra-state/src/service.rs:886
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
2024-11-29T13:34:28.807754Z INFO dropping the state: logging database metrics
2024-11-29T13:34:28.807794Z INFO start_state_service_with_cache_dir{network=Mainnet}: StateService closed the block reset channel. Is Zebra shutting down?
2024-11-29T13:34:28.807823Z INFO the open file limit is high enough for Zebra current_limit=1024 min_limit=512 ideal_limit=1024
2024-11-29T13:34:28.818175Z INFO Total Database Disk Size: 270.2 GB
2024-11-29T13:34:28.818187Z INFO Total Live Data Disk Size: 268.8 GB
2024-11-29T13:34:28.818189Z INFO Total Database Memory Size: 51.2 KB
2024-11-29T13:34:28.818247Z INFO checking new blocks were written in current database format running_version=26.0.0+indexer
2024-11-29T13:34:28.818252Z INFO checking database format produced by new blocks in this instance is valid running_version=26.0.0+indexer
2024-11-29T13:34:40.438333Z INFO database format is valid running_version=26.0.0+indexer inital_disk_version=26.0.0+indexer
2024-11-29T13:34:40.512623Z INFO waiting for the block write task to finish
2024-11-29T13:34:40.512747Z INFO checking new blocks were written in current database format running_version=26.0.0+indexer
2024-11-29T13:34:40.512764Z INFO checking database format produced by new blocks in this instance is valid running_version=26.0.0+indexer
2024-11-29T13:34:52.479942Z INFO database format is valid running_version=26.0.0+indexer inital_disk_version=26.0.0+indexer
test has_spending_transaction_ids ... FAILED
failures:
failures:
has_spending_transaction_ids
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 48 filtered out; finished in 2728.65s
error: test failed, to rerun pass `-p zebrad --test acceptance`
Test .. ended up failing locally for me
The test checks that a prepared finalized state has the indexes. Documented and updated to use a multi-threaded async runtime in https://github.com/ZcashFoundation/zebra/pull/8895/commits/f71c8977749a4e69e595139940f169df95e506a9 (I'm not sure how it was working for me before, the spawn_blocking call was always there).
the acceptance test has been running for more than 30 mins now. I will like to know your experience with it.
It syncs to the network tip, but it takes ~10 minutes with an up-to-date cached state for me, mostly waiting for the "finished initial sync" log. It should add the indexes within 30 minutes (depending on system resources, but the format upgrade is ~10 minutes for me).
It keeps failing with "should have spending transaction hash", I'm not sure why yet.
It keeps failing with "should have spending transaction hash", I'm not sure why yet.
There was a disk format deserialization bug where some of the db read methods were returning None because there were fewer than size_of::<TransactionLocation>() value bytes (the type includes a height, which is serialized as 3 bytes instead of 4).
@oxarbitrage do you mind reviewing this one as you reviewed it previously?
This pull request has been removed from the queue for the following reason: checks failed.
The merge conditions cannot be satisfied due to failing checks:
- ❌ Integration tests / lightwalletd tip update / Run lwd-update-sync test
- ☑️ Build CD Docker / Build images
- ☑️ Get disk name / Get Mainnet cached disk
- ☑️ Test CD custom Docker config file / Test custom-conf in Docker
- ☑️ Test CD default Docker config file / Test default-conf in Docker
- ☑️ Test CD testnet Docker config file / Test testnet-conf in Docker
You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.
If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue
@mergify requeue
requeue
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically
This pull request has been removed from the queue for the following reason: checks failed.
The merge conditions cannot be satisfied due to failing checks:
- ❌ Unit tests / Test integration with lightwalletd
- ☑️ Build CD Docker / Build images
- ☑️ Get disk name / Get Mainnet cached disk
- ☑️ Test CD custom Docker config file / Test custom-conf in Docker
- ☑️ Test CD default Docker config file / Test default-conf in Docker
- ☑️ Test CD testnet Docker config file / Test testnet-conf in Docker
You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.
If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue
@mergify requeue
requeue