[aptos-vm] Adjust severity of some error codes.
Description
Adjust the error severity of Move functions inside AptosVM. Currently the known move functions such as block prologue, txn prologue and epilogue are viewed as non fallible from AptosVM. This should be the case only with sequential execution though, as with speculative parallel execution, some of the functions may fail because prior txn hasn't been scheduled to be executed yet.
This happens quite frequently to the blockmetadata transaction during state sync where multiple block metadata transactions are executed at once, which created very nasty false positive alarms for on call.
Test Plan
TBD
Running tests again I found that we might not need to change the report logic at chunk executor at all. The expect_only_successful_execution is only invoked in two locations:
- Block prologue transaction
- The failure transaction epilogue
In the (2) case, if such error occured, there will be a critical error emitted here. And this error shouldn't occur even with the presence of speculative execution. It would be great if @movekevin can help me verify this claim.
In the (1) case, an error can potentially occur when there are multiple block prologue transactions to be executed by the speculative parallel executor. In this case, the error here is benign. If the resolved transaction does incur an error when parallel executor sort the writeset out eventually, the aptos_vm will return an error in VMExecutor::execute_block which will be accumulated in here. I'm not sure where the consensus is reporting this error but I do remembered seeing this error being logged somewhere. Maybe @zekun000 know where this logic is?
In case of a real block prologue error, it would be reported here: https://github.com/aptos-labs/aptos-core/blob/5f7c7d159c600faa4a9120b6a5293ad19ccc208e/consensus/src/experimental/buffer_manager.rs#L356
Forge is running suite compat on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
- Grafana dashboard (auto-refresh)
- Humio Logs
- Test runner output
- Test run is land-blocking
Forge is running suite land_blocking on 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
- Grafana dashboard (auto-refresh)
- Humio Logs
- Test runner output
- Test run is land-blocking
:white_check_mark: Forge suite land_blocking success on 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
performance benchmark with full nodes : 6929 TPS, 5722 ms latency, 8400 ms p99 latency,no expired txns
Test Ok
- Grafana dashboard
- Humio Logs
- Test runner output
- Test run is land-blocking
:white_check_mark: Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 6784 TPS, 5629 ms latency, 8500 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
compatibility::simple-validator-upgrade::single-validator-upgrade : 4207 TPS, 9959 ms latency, 12700 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
compatibility::simple-validator-upgrade::half-validator-upgrade : 4862 TPS, 8605 ms latency, 11400 ms p99 latency,no expired txns
4. upgrading second batch to new version: 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
compatibility::simple-validator-upgrade::rest-validator-upgrade : 6524 TPS, 5935 ms latency, 9300 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc passed
Test Ok
- Grafana dashboard
- Humio Logs
- Test runner output
- Test run is land-blocking