perf: Stop expensive cryptographic validation when deserializing shielded transactions
Motivation
Zebra's transaction deserialization does a lot of CPU-heavy cryptography for shielded transactions.
Since we know that transactions in our database are valid, we can deserialize them without checking the validity of curve points.
API Reference
TODO: which methods are causing this CPU usage, and what can they be replaced with?
Detailed Analysis
See https://github.com/ZcashFoundation/zebra/issues/4583#issuecomment-1186662862
Designs
In zebra-chain:
- split out transaction deserialisation methods that optionally do validation
- do validation by default in ZcashDeserialize (with transactions and blocks from the network and RPCs)
In zebra-state:
- skip validation for transactions and blocks when reading from the state in FromBytes
Related Work
- #4788 (which didn't end up improving performance)
Is this still relevant outside of the zebra-scan crate?
Copying this here for reference:
Parsing blocks is definitely much more expensive than loading blocks. Benchmark that loads 10k blocks starting from 1723000 (arbitrary height I picked at the middle of the spam):
- just load from disk, but do not parse: 16.9s
- load from disk and parse: 455.8s (~27 times slower)
I ran sudo sh -c "/usr/bin/echo 3 > /proc/sys/vm/drop_caches" before benchmarking to clear the file cache
Some design ideas:
- Duplicate the entire Transaction tree (e.g. to
PartiallyDeserializedTransaction) where the relevant leaf types are replace with~Bytestypes, e.g.ValueCommitment->ValueCommitmentBytes- Probably the simplest, does not affeact the rest of the code, but introduces a lot of duplicated code
- Change the relevant leaf types to
~Bytestypes- Will break a lot of code which will need to be adjusted to convert from the
~Bytestype to the original type where required. Requires a lot of code changes, and carries the risk of doing those conversions multiple times, decreasing performance in some cases
- Will break a lot of code which will need to be adjusted to convert from the
- Use generic magic to allow the specification of which leaf types to use (the regular or the
~Bytesones), and default to the regular ones so existing code doesn't break (much)- Equivalent to the first option, but without the duplicated code
- Haven't tried, no idea if it would work
- Add lazy loading. Change each leaf type to a enum withe the
~Bytesand the regular type, which on deserialization just reads the bytes; then if anything reads the original type, convert it, cache it and return it. We already do something similar to cache note commitment tree roots. [thanks @arya2 for the idea]
Also note that the librustzcash Transaction type has the same issue. So if we ever want to migrate to it, we might want to consider how to change it too.