rusk icon indicating copy to clipboard operation
rusk copied to clipboard

Investigate block acceptance timing

Open herr-seppia opened this issue 1 year ago • 7 comments

Summary

Block acceptance is taking ~200ms

{"timestamp":"2023-11-22T11:26:14.936995Z","level":"DEBUG","event":"committee_generated","members":" [0]=pk:npbMNQM1xKYsxzrB w:1,","target":"dusk_consensus::phase","spans":[{"name":"sel","pk":"212Eszgyh7GqMTws","round":7161,"step":0,"name":"main"}]}
{"timestamp":"2023-11-22T11:26:14.937213Z","level":"DEBUG","event":"run event_loop","target":"dusk_consensus::execution_ctx","spans":[{"name":"sel","pk":"212Eszgyh7GqMTws","round":7161,"step":0,"name":"main"}]}
{"timestamp":"2023-11-22T11:26:14.937428Z","level":"DEBUG","event":"iter completed","len":0,"round":7161,"iter":0,"target":"dusk_consensus::execution_ctx"}
{"timestamp":"2023-11-22T11:26:14.937558Z","level":"INFO","message":"Received finalize request","target":"rusk::vm"}
{"timestamp":"2023-11-22T11:26:14.952122Z","level":"INFO","event":"block accepted","height":7161,"iter":0,"hash":"378c44abb97a11e8...dc5600b71a975224","txs":0,"state_hash":"f5f251a24cb8c986...30f22ded1801d8f4","fsv_bitset":129482526,"ssv_bitset":259355773,"block_time":7,"dur_ms":"14","target":"node::chain::acceptor"}
{"timestamp":"2023-11-22T11:26:14.952335Z","level":"INFO","event":"spawn consensus","id":6641,"round":7162,"all":30,"eligible":30,"target":"node::chain::consensus"}
{"timestamp":"2023-11-22T11:26:14.952470Z","level":"WARN","message":"consensus err: Canceled","target":"node::chain"}
{"timestamp":"2023-11-22T11:26:14.952610Z","level":"INFO","event":"block received","src":"wire","blk_height":7162,"blk_hash":"f04878b7118190a7...56fa7c4b59ec9ca8","target":"node::chain"}
{"timestamp":"2023-11-22T11:26:15.151301Z","level":"DEBUG","event":"consensus canceled","round":7162,"target":"dusk_consensus::consensus"}
{"timestamp":"2023-11-22T11:26:15.151761Z","level":"DEBUG","event":"init","name":"sel","round":7162,"step":0,"timeout":5000,"target":"dusk_consensus::selection::step"}
{"timestamp":"2023-11-22T11:26:15.152009Z","level":"DEBUG","event":"execute_step","timeout":5000,"target":"dusk_consensus::phase","spans":[{"name":"sel","pk":"212Eszgyh7GqMTws","round":7162,"step":0,"name":"main"}]}
{"timestamp":"2023-11-22T11:26:15.152194Z","level":"DEBUG","event":"committee_generated","members":" [0]=pk:23YLXnqMDSS7Dts6 w:1,","target":"dusk_consensus::phase","spans":[{"name":"sel","pk":"212Eszgyh7GqMTws","round":7162,"step":0,"name":"main"}]}
{"timestamp":"2023-11-22T11:26:15.152329Z","level":"DEBUG","event":"run event_loop","

herr-seppia avatar Nov 22 '23 11:11 herr-seppia

Confirmed! Running it locally showed the bottleneck is probably in verify_block_header but also abort_and_wait should be taken into account. This needs to be cpu-profileed and eventually optimized.

Bootstrapping of 30 provisioners in a single machine:

{"level":"INFO","event":"after verify_block_header","height":2,"dur_ms":"4567","target":"node::chain::acceptor"}
{"level":"INFO","event":"after abort_and_wait","height":2,"dur_ms":"780","target":"node::chain::acceptor"}
{"level":"INFO","event":"block accepted","height":2,"iter":0,"hash":"26b1c44d781aafe2...0d49ae87a09d8efa","txs":0,"state_hash":"7be435993b353a3d...f87703e9703c12cb","fsv_bitset":129941471,"ssv_bitset":64928975,"block_time":44,"generator":"22GNXdXKRebsro2F","dur_ms":"16","tar

goshawk-3 avatar Nov 24 '23 10:11 goshawk-3

pprof001

CPU-profiling data attached proves that the bottleneck is in APK::aggregate.

goshawk-3 avatar Nov 24 '23 11:11 goshawk-3

Perfect, going to bench it on bls-sign and see what can be done

herr-seppia avatar Nov 24 '23 18:11 herr-seppia

I've added benches that aggregate 30 pks on bls_sign:

However, timing seems to be different, since it result only 26ms to aggregate instead of 200ms as per your profiler

cargo bench bench_aggregate_pk

test benches::bench_aggregate_pk              ... bench:     882,331 ns/iter (+/- 30,354)
test benches::bench_aggregate_pk_30           ... bench:  26,540,112 ns/iter (+/- 682,946)
test benches::bench_aggregate_pk_30_each      ... bench:  26,623,183 ns/iter (+/- 654,939)

herr-seppia avatar Nov 24 '23 19:11 herr-seppia

Timing seems to be consistent. Indeed we perform 2 aggregation per certificate. And latest main verifies at least 2 certificate per block (current+prev)

Addressing the performance with a parallel feature on BLS library

herr-seppia avatar Nov 27 '23 12:11 herr-seppia

Certainly. Yes, each certificate contains two signatures, and a block can have anywhere from two to (RELAX_ITERATION_THRESHOLD+2) certificates. In simpler terms, we can anticipate a block having between 4 and 24 signatures, with RELAX_ITERATION_THRESHOLD set to 10.

goshawk-3 avatar Nov 28 '23 09:11 goshawk-3

Certainly. Yes, each certificate contains two signatures, and a block can have anywhere from two to (RELAX_ITERATION_THRESHOLD+2) certificates. In simpler terms, we can anticipate a block having between 4 and 24 signatures, with RELAX_ITERATION_THRESHOLD set to 10.

note that failed iterations will typically have around half signatures to aggregate (NilQuorum is 22 sigs) than the agreement quorum (43 sigs), so the impact of their certificate should be aprox 2 failed-iteration certificate = 1 quorum certificate

fed-franz avatar Nov 28 '23 15:11 fed-franz