[TEMP ON HOLD] tweak: Use a visitor pattern for more efficient SBOR traversals
[!NOTE] This PR attempts to convert the "Pulling" event style SBOR traverser "exterior iterator" to a "Visiting" "interior iterator" style abstraction, to validate potential performance gains. Side node: Technically the "Pulling" style abstraction is actually a "Generator / Iterator / Coroutine without arguments" in new Rust terminology [article, 2023 blog - so we should probably just make it an Iterator of an Event - the main issue is it's a borrowing iterator, because its items refer to the traverser itself. It can actually also probably be rewritten as a generator when the
genblock is stabilized.This PR actually only goes half-way, to a "resumable visiting" pattern.
By attempting this, it was discovered we likely want to support both a pulling and a non-resumable visiting traverser. This would need to be done both at the untyped and typed layer.
That is because:
- The visitor model can be up to ~10x more efficient for most use cases (decoding RawValue, validations, IndexedScryptoValue, re-encode ManifestValue to ScryptoValue)
- BUT it would be very nasty to re-write the serialization as a visitor. Because of how serde serializers work (and lots of existing code!), it really favours an external pull-based system, which allows us to use the call frames/stack to keep track of how we're serializing things.
So quick wins:
- Reinstate
VecTraverser => UntypedPullingTraverseras its own thing - this lets us optimizeUntypedTraverser => UntypedVisitingTraverserfurther to not be resumable, and allowing more declarative logic.- Also consider trialling a stack-allocated bounded array (or some kind of small-vec library)
- Do the same with
TypedTraverserand use it inPayloadValidator
Summary
Relates to https://github.com/radixdlt/radixdlt-scrypto/pull/1860
Basically just a side-exploration on my off-day to trial a visitor pattern for SBOR traversals. And makes VecTraverser use a shim to the new visitor pattern to validate it in tests.
I imagine it makes VecTraverser too slow in its current form, so we probably don't want to merge this.
But I'm interested to see if the RawSborValue decoding is significantly faster (I'm hoping it can be, due to various optimizations which can be done for calculate_value_tree_body_byte_length now).
When I have some more down time, I might investigate if we can do the same treatment to TypedTraverser, payload validation, etc.
Perf review:
Winners:
| Test | Base | PR | % |
|---|---|---|---|
| costing::decode_encoded_i8_array_to_manifest_raw_value | 19.5±0.04ms | 6.2±0.00ms | -68.21% |
| costing::decode_encoded_tuple_array_to_manifest_raw_value | 63.4±0.15ms | 18.3±0.02ms | -71.14% |
| costing::decode_rpd_to_manifest_raw_value | 12.4±0.02µs | 5.4±0.03µs | -56.45% |
| costing::validate_sbor_payload | 32.6±0.13µs | 29.5±0.06µs | -9.51% |
And losers:
| Test | Base | PR | % |
|---|---|---|---|
| costing::decode_encoded_u8_array_to_manifest_raw_value | 25.6±0.13µs | 27.2±2.37µs | +6.25% |
| costing::execute_transaction_reading_big_vec_substates | 592.8±0.63ms | 642.2±0.47ms | +8.33% |
| schema::validate_payload | 364.2±1.39µs | 406.3±0.68µs | +11.56% |
This broadly fits with expectations.
- Decoding to raw value is ~70% faster, and can likely be improved up to 85% I think if we make it non-resumable.
- I think
costing::validate_sbor_payloadis an anomoly. Otherwise, validation is ~10% slower. This would be fixed when it's resumable.
Details
TBC
Testing
Existing tests pass... Let's take a look at the benchmark comparison for validation!
Docker tags docker.io/radixdlt/private-scrypto-builder:3ebc88b536
Benchmark for 3ebc88b
Click to view benchmark
| Test | Base | PR | % |
|---|---|---|---|
| costing::bench_prepare_wasm | 44.7±0.07ms | 44.6±0.10ms | -0.22% |
| costing::decode_encoded_i8_array_to_manifest_raw_value | 19.4±0.03ms | 5.6±0.00ms | -71.13% |
| costing::decode_encoded_i8_array_to_manifest_value | 42.3±0.04ms | 41.4±0.10ms | -2.13% |
| costing::decode_encoded_tuple_array_to_manifest_raw_value | 63.4±0.06ms | 18.1±0.36ms | -71.45% |
| costing::decode_encoded_tuple_array_to_manifest_value | 98.0±0.14ms | 105.6±0.28ms | +7.76% |
| costing::decode_encoded_u8_array_to_manifest_raw_value | 32.7±0.15µs | 31.9±0.17µs | -2.45% |
| costing::decode_encoded_u8_array_to_manifest_value | 42.3±0.06ms | 41.4±0.05ms | -2.13% |
| costing::decode_rpd_to_manifest_raw_value | 12.6±0.03µs | 5.3±0.02µs | -57.94% |
| costing::decode_rpd_to_manifest_value | 11.0±0.04µs | 10.9±0.02µs | -0.91% |
| costing::deserialize_wasm | 1272.0±5.36µs | 1256.6±2.85µs | -1.21% |
| costing::execute_transaction_creating_big_vec_substates | 697.7±10.47ms | 699.5±9.53ms | +0.26% |
| costing::execute_transaction_reading_big_vec_substates | 591.8±1.78ms | 590.7±2.13ms | -0.19% |
| costing::instantiate_flash_loan | 872.3±411.13µs | 985.4±605.11µs | +12.97% |
| costing::instantiate_radiswap | 1058.5±1216.17µs | 1095.1±1629.37µs | +3.46% |
| costing::spin_loop | 21.7±0.04ms | 20.8±0.05ms | -4.15% |
| costing::validate_sbor_payload | 32.6±0.06µs | 29.0±0.07µs | -11.04% |
| costing::validate_sbor_payload_bytes | 258.1±1.51ns | 251.3±5.90ns | -2.63% |
| costing::validate_secp256k1 | 76.8±0.16µs | 76.7±0.09µs | -0.13% |
| costing::validate_wasm | 33.6±0.05ms | 33.8±0.04ms | +0.60% |
| decimal::add/0 | 8.4±0.00ns | 8.4±0.00ns | 0.00% |
| decimal::add/rust-native | 9.9±0.01ns | 9.9±0.01ns | 0.00% |
| decimal::add/wasmi | 223.2±0.54ns | 226.1±0.33ns | +1.30% |
| decimal::add/wasmi-call-native | 2.2±0.00µs | 2.1±0.00µs | -4.55% |
| decimal::div/0 | 186.6±0.16ns | 186.4±0.13ns | -0.11% |
| decimal::from_string/0 | 157.7±0.43ns | 158.8±0.26ns | +0.70% |
| decimal::mul/0 | 149.7±0.10ns | 149.6±0.10ns | -0.07% |
| decimal::mul/rust-native | 148.0±0.18ns | 150.5±0.23ns | +1.69% |
| decimal::mul/wasmi | 12.1±0.02µs | 11.9±0.04µs | -1.65% |
| decimal::mul/wasmi-call-native | 2.3±0.00µs | 2.3±0.01µs | 0.00% |
| decimal::pow/0 | 719.1±1.14ns | 689.1±0.25ns | -4.17% |
| decimal::pow/rust-native | 668.9±0.49ns | 669.1±0.50ns | +0.03% |
| decimal::pow/wasmi | 58.2±0.23µs | 56.8±0.12µs | -2.41% |
| decimal::pow/wasmi-call-native | 2.6±0.01µs | 2.5±0.00µs | -3.85% |
| decimal::root/0 | 8.1±0.01µs | 8.2±0.00µs | +1.23% |
| decimal::sub/0 | 8.2±0.01ns | 8.2±0.00ns | 0.00% |
| decimal::to_string/0 | 444.4±0.38ns | 436.1±0.52ns | -1.87% |
| precise_decimal::add/0 | 9.1±0.01ns | 9.0±0.01ns | -1.10% |
| precise_decimal::add/rust-native | 10.8±0.06ns | 10.7±0.03ns | -0.93% |
| precise_decimal::add/wasmi | 276.3±1.19ns | 283.1±0.46ns | +2.46% |
| precise_decimal::add/wasmi-call-native | 2.9±0.00µs | 2.8±0.01µs | -3.45% |
| precise_decimal::div/0 | 317.3±0.40ns | 317.9±1.18ns | +0.19% |
| precise_decimal::from_string/0 | 210.0±0.91ns | 209.0±0.62ns | -0.48% |
| precise_decimal::mul/0 | 366.2±0.28ns | 370.5±1.54ns | +1.17% |
| precise_decimal::mul/rust-native | 321.6±0.71ns | 320.7±0.34ns | -0.28% |
| precise_decimal::mul/wasmi | 34.3±0.13µs | 34.0±0.29µs | -0.87% |
| precise_decimal::mul/wasmi-call-native | 3.3±0.01µs | 3.2±0.00µs | -3.03% |
| precise_decimal::pow/0 | 1967.3±4.05ns | 1931.1±8.12ns | -1.84% |
| precise_decimal::pow/rust-native | 1540.0±3.88ns | 1544.3±2.81ns | +0.28% |
| precise_decimal::pow/wasmi | 164.0±1.71µs | 167.3±2.94µs | +2.01% |
| precise_decimal::pow/wasmi-call-native | 5.8±0.01µs | 5.7±0.02µs | -1.72% |
| precise_decimal::root/0 | 58.4±0.06µs | 58.6±0.05µs | +0.34% |
| precise_decimal::sub/0 | 9.2±0.02ns | 9.2±0.04ns | 0.00% |
| precise_decimal::to_string/0 | 709.8±3.01ns | 693.9±1.90ns | -2.24% |
| schema::validate_payload | 365.8±0.88µs | 409.6±0.53µs | +11.97% |
| transaction::radiswap | 5.1±0.02ms | 5.1±0.02ms | 0.00% |
| transaction::transfer | 1881.5±4.32µs | 1730.2±3.60µs | -8.04% |
| transaction_processing::prepare | 2.6±0.00ms | 2.3±0.00ms | -11.54% |
| transaction_processing::prepare_and_decompile | 6.8±0.03ms | 6.5±0.02ms | -4.41% |
| transaction_processing::prepare_and_decompile_and_recompile | 30.6±1.60ms | 32.2±0.31ms | +5.23% |
| transaction_validation::validate_manifest | 42.9±0.05µs | 42.7±0.03µs | -0.47% |
| transaction_validation::verify_bls_2KB | 1062.5±11.54µs | 1004.3±8.79µs | -5.48% |
| transaction_validation::verify_bls_32B | 1005.8±11.56µs | 1003.7±8.39µs | -0.21% |
| transaction_validation::verify_ecdsa | 74.8±0.20µs | 74.5±0.04µs | -0.40% |
| transaction_validation::verify_ed25519 | 55.0±0.07µs | 54.8±0.14µs | -0.36% |