zebra perf: Stop expensive cryptographic validation when deserializing shielded transactions

Motivation

Zebra's transaction deserialization does a lot of CPU-heavy cryptography for shielded transactions.

Since we know that transactions in our database are valid, we can deserialize them without checking the validity of curve points.

API Reference

TODO: which methods are causing this CPU usage, and what can they be replaced with?

Detailed Analysis

See https://github.com/ZcashFoundation/zebra/issues/4583#issuecomment-1186662862

Designs

In zebra-chain:

split out transaction deserialisation methods that optionally do validation
do validation by default in ZcashDeserialize (with transactions and blocks from the network and RPCs)

In zebra-state:

skip validation for transactions and blocks when reading from the state in FromBytes

Related Work

#4788 (which didn't end up improving performance)

Nov 13 '23 19:11 teor2345

Is this still relevant outside of the zebra-scan crate?

Oct 18 '24 13:10 mpguerra

Copying this here for reference:

Parsing blocks is definitely much more expensive than loading blocks. Benchmark that loads 10k blocks starting from 1723000 (arbitrary height I picked at the middle of the spam):

just load from disk, but do not parse: 16.9s
load from disk and parse: 455.8s (~27 times slower)

I ran sudo sh -c "/usr/bin/echo 3 > /proc/sys/vm/drop_caches" before benchmarking to clear the file cache

Apr 16 '25 23:04 conradoplg

Some design ideas:

Duplicate the entire Transaction tree (e.g. to PartiallyDeserializedTransaction) where the relevant leaf types are replace with ~Bytes types, e.g. ValueCommitment -> ValueCommitmentBytes
- Probably the simplest, does not affeact the rest of the code, but introduces a lot of duplicated code
Change the relevant leaf types to ~Bytes types
- Will break a lot of code which will need to be adjusted to convert from the ~Bytes type to the original type where required. Requires a lot of code changes, and carries the risk of doing those conversions multiple times, decreasing performance in some cases
Use generic magic to allow the specification of which leaf types to use (the regular or the ~Bytes ones), and default to the regular ones so existing code doesn't break (much)
- Equivalent to the first option, but without the duplicated code
- Haven't tried, no idea if it would work
Add lazy loading. Change each leaf type to a enum withe the ~Bytes and the regular type, which on deserialization just reads the bytes; then if anything reads the original type, convert it, cache it and return it. We already do something similar to cache note commitment tree roots. [thanks @arya2 for the idea]

Also note that the librustzcash Transaction type has the same issue. So if we ever want to migrate to it, we might want to consider how to change it too.

May 19 '25 13:05 conradoplg