zebra icon indicating copy to clipboard operation
zebra copied to clipboard

Mini-Epic: Stop tokio tasks running for a long time and blocking other tasks

Open teor2345 opened this issue 1 year ago • 0 comments

Motivation

At the moment, Zebra can't sync all the way to the tip, because some tokio tasks run for a long time, and block other tasks. (It's also possible there are some deadlocks, livelocks, or missed task exits.)

We should discover the specific bugs using tokio-console, and then open a ticket for each one.

Tasks

Issues that need investigation

  • [x] #4583
  • [ ] #4823 and then speed them up
  • [ ] Add a Future wrapper that times each poll, and logs long polls

CPU usage analysis

  • [x] #4825 & #4826

  • [x] #4779

Deserialization (in zebra-network or zebra-state):

  • [x] sapling::output::OutputPrefixInTransactionV5::zcash_deserialize() (#4787)
    • sapling::committment::ValueCommitment::try_from::<[u8; 32]>()
    • bellman::groth16::Proof::read()
    • jubjub::AffinePoint::from_bytes_inner()
    • bls12_381::scalar::square()
    • bls12_381::scalar::sqrt()
  • [x] finalized_state::ZebraDb::block() (#4788)
    • partially fixed by #4792

Verification (in zebra-consensus):

  • [ ] groth16::DescriptionWrapper::try_from() (#4789)
    • transaction::Verifier::verify_v5_transaction()

Note commitment tree updates (in zebra-state, either finalized or non-finalized):

  • [x] Note commitment tree append and root (#4790)
    • non_finalized_state::chain::UpdateWith
    • sapling::tree::merkle_crh_sapling()
    • sapling::commitment::pedersen_hashes::pedersen_hash()
    • sapling::commitment::pedersen_hashes::pedersen_hash_to_point()
    • incrementalmerkletree::bridgetree::Frontier::append()
  • [x] #4721 should also help with non-finalized blocks

Fixed Issues

  • [x] #4581

Fixed by #4750:

  • [x] #4738
  • [x] #4740
  • [x] #4729
  • [x] Replace batch verifier broadcast channels with watch channels (cleanup only)

Fixed by #4752 and #4726:

  • [x] #4650
  • [x] Lagged inventory advertisements (might also need #4750)
  • [x] Verification failure after block 1719629 on mainnet (also block 1944652 on testnet is very slow)

teor2345 avatar Jul 04 '22 20:07 teor2345