aptos-core icon indicating copy to clipboard operation
aptos-core copied to clipboard

[aptos-vm] Adjust severity of some error codes.

Open runtian-zhou opened this issue 3 years ago • 1 comments

Description

Adjust the error severity of Move functions inside AptosVM. Currently the known move functions such as block prologue, txn prologue and epilogue are viewed as non fallible from AptosVM. This should be the case only with sequential execution though, as with speculative parallel execution, some of the functions may fail because prior txn hasn't been scheduled to be executed yet.

This happens quite frequently to the blockmetadata transaction during state sync where multiple block metadata transactions are executed at once, which created very nasty false positive alarms for on call.

Test Plan

TBD


This change is Reviewable

runtian-zhou avatar Nov 15 '22 23:11 runtian-zhou

Running tests again I found that we might not need to change the report logic at chunk executor at all. The expect_only_successful_execution is only invoked in two locations:

  1. Block prologue transaction
  2. The failure transaction epilogue

In the (2) case, if such error occured, there will be a critical error emitted here. And this error shouldn't occur even with the presence of speculative execution. It would be great if @movekevin can help me verify this claim.

In the (1) case, an error can potentially occur when there are multiple block prologue transactions to be executed by the speculative parallel executor. In this case, the error here is benign. If the resolved transaction does incur an error when parallel executor sort the writeset out eventually, the aptos_vm will return an error in VMExecutor::execute_block which will be accumulated in here. I'm not sure where the consensus is reporting this error but I do remembered seeing this error being logged somewhere. Maybe @zekun000 know where this logic is?

runtian-zhou avatar Nov 16 '22 05:11 runtian-zhou

In case of a real block prologue error, it would be reported here: https://github.com/aptos-labs/aptos-core/blob/5f7c7d159c600faa4a9120b6a5293ad19ccc208e/consensus/src/experimental/buffer_manager.rs#L356

runtian-zhou avatar Nov 16 '22 20:11 runtian-zhou

Forge is running suite compat on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc

github-actions[bot] avatar Nov 28 '22 23:11 github-actions[bot]

Forge is running suite land_blocking on 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc

github-actions[bot] avatar Nov 28 '22 23:11 github-actions[bot]

:white_check_mark: Forge suite land_blocking success on 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc

performance benchmark with full nodes : 6929 TPS, 5722 ms latency, 8400 ms p99 latency,no expired txns
Test Ok

github-actions[bot] avatar Nov 29 '22 00:11 github-actions[bot]

:white_check_mark: Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 6784 TPS, 5629 ms latency, 8500 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
compatibility::simple-validator-upgrade::single-validator-upgrade : 4207 TPS, 9959 ms latency, 12700 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
compatibility::simple-validator-upgrade::half-validator-upgrade : 4862 TPS, 8605 ms latency, 11400 ms p99 latency,no expired txns
4. upgrading second batch to new version: 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc
compatibility::simple-validator-upgrade::rest-validator-upgrade : 6524 TPS, 5935 ms latency, 9300 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 7e59d12bd6d0b694dc74fa17a4a3f883988b42cc passed
Test Ok

github-actions[bot] avatar Nov 29 '22 00:11 github-actions[bot]