zebra icon indicating copy to clipboard operation
zebra copied to clipboard

Move CPU-heavy proof preparation into the batch cryptography thread

Open teor2345 opened this issue 1 year ago • 11 comments

Motivation

Zebra's sync can be slow or fail, because proof batch preparation involves some CPU-heavy cryptography.

So we want to move proof preparation into the existing batch rayon CPU thread pool scope.

API Reference

Blocking threads: https://docs.rs/tokio/latest/tokio/task/fn.spawn_blocking.html

CPU-heavy threads: https://docs.rs/rayon/latest/rayon/fn.scope_fifo.html

Detailed Analysis

See https://github.com/ZcashFoundation/zebra/issues/4583#issuecomment-1186662862

Designs

When queueing batch items, store them raw, then do preparation inside the rayon scope, for the following verifiers:

  • groth16
  • halo2

Existing code for rayon in tokio blocking threads: https://github.com/ZcashFoundation/zebra/blob/81727d7df0ee57e6d4f2172c287e52e414c4e6a0/zebra-consensus/src/primitives/ed25519.rs#L121-L134

Related Work

teor2345 avatar Jul 18 '22 21:07 teor2345

Note that we wanted to move proof decoding inside tx deserialization: https://github.com/ZcashFoundation/zebra/issues/3179

I don't think this changes anything since the proposed approach in this ticket is probably easier, but I thought it was worth it pointing out (i.e. another solution would be to do #3179, and move the entire deserialization to a thread, which I think we'll do anyway in another ticket).

If we follow the proposed approach in this ticket then we should probably close #3179

conradoplg avatar Jul 19 '22 13:07 conradoplg

Good point @conradoplg. I can take this ticket, but if we move the proof preparation to rayon, shouldn't we still keep in mind that we should use bellman::Proof in Groth16Proof instead of a byte array?

upbqdn avatar Jul 19 '22 17:07 upbqdn

Yep we need to decide which approach to use:

  • Leave the byte array, and decode inside the batch verifier, as suggested in the ticket
  • Replace the byte array, such that the decoding happens when deserializing, and then https://github.com/ZcashFoundation/zebra/issues/4787 will automatically solve this issue. The downside is that replacing the array requires a lot of changes (not that much, but it seems it will be a bit annoying) as described in #3179 so we may not want to do this right now.

(Thinking about it, if we do the first, we can keep #3179 open as long as we add a note to it to remember to revert the decoding inside the batch verifier)

conradoplg avatar Jul 19 '22 17:07 conradoplg

I think I'd lean towards the second option in terms of future maintenance since it seems a bit cleaner.

upbqdn avatar Jul 19 '22 18:07 upbqdn

I'd be interested in comparing the performance differences between the two different options.

I think at the moment we're looking for a quick fix, so we can re-test Zebra sync, and find any bugs or excessive CPU usage.

teor2345 avatar Jul 19 '22 23:07 teor2345

I'd be happy to change https://github.com/zcash-hackworks/zcash-test-vectors/ to generate structurally valid (but still unverifiable) Groth16 proofs, which seems to be one of the main blockers for #3179. I won't have time to do this before Zcon, though.

daira avatar Jul 20 '22 07:07 daira

Right now, we need to do the quick fix of moving proof decoding into the batch rayon threads.

After Zebra is syncing correctly, we can explore cleanup/simplification tasks like #3179.

teor2345 avatar Jul 20 '22 23:07 teor2345

We're also seeing stalls during checkpoint verification, which doesn't use the batch verifier code. So we might want to review the priority of this ticket.

We could check how much time proof preparation is actually taking? I'll try to open a PR with some easy-to-use functions today.

teor2345 avatar Jul 24 '22 21:07 teor2345

Conrado did some testing, and proof preparation takes 1-25ms.

So we don't need to do this ticket right now, but we should do it eventually, to handle large aggregated Orchard proofs.

@upbqdn how much progress have you made on this ticket so far? It's ok to finish it off and submit a PR if you want.

teor2345 avatar Jul 26 '22 19:07 teor2345

@upbqdn how much progress have you made on this ticket so far?

I have no implementation ready to be pushed yet.

upbqdn avatar Jul 26 '22 23:07 upbqdn

@upbqdn how much progress have you made on this ticket so far?

I have no implementation ready to be pushed yet.

Feel free to finish it off, or to leave it for now, and move on to something that's more urgent.

teor2345 avatar Jul 27 '22 01:07 teor2345

It looks like no-one is working on this right now, and we've finished the epic it was a part of.

teor2345 avatar Aug 23 '22 00:08 teor2345

Performance improvements are not a priority for Zebra right now.

teor2345 avatar Aug 26 '22 04:08 teor2345