aptos-core icon indicating copy to clipboard operation
aptos-core copied to clipboard

[forge] Add continuous progress check to SuccessCriteria

Open igor-aptos opened this issue 2 years ago • 1 comments

Description

Test Plan

added to landblocking

run had:

Passed progress check. Max round gap was 1 [limit 4] at version 2213899. Max no progress secs was 3.080434 [limit 10] at version 3689446.


This change is Reviewable

igor-aptos avatar Sep 13 '22 04:09 igor-aptos

Forge is running suite land_blocking on 14bd8a196d6eeadb5af78b508632ee7a5c485f0a

Forge is running suite compat on testnet ==> 14bd8a196d6eeadb5af78b508632ee7a5c485f0a

:white_check_mark: Forge suite land_blocking success on 14bd8a196d6eeadb5af78b508632ee7a5c485f0a

performance benchmark with full nodes : 7486 TPS, 3970 ms latency, 6900 ms p99 latency,no expired txns
Test Ok

:white_check_mark: Forge suite compat success on testnet ==> 14bd8a196d6eeadb5af78b508632ee7a5c485f0a

Compatibility test results for testnet ==> 14bd8a196d6eeadb5af78b508632ee7a5c485f0a (PR)
1. Check liveness of validators at old version: testnet
compatibility::simple-validator-upgrade::liveness-check : 6934 TPS, 3866 ms latency, 6300 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 14bd8a196d6eeadb5af78b508632ee7a5c485f0a
compatibility::simple-validator-upgrade::single-validator-upgrade : 6084 TPS, 4591 ms latency, 6000 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 14bd8a196d6eeadb5af78b508632ee7a5c485f0a
compatibility::simple-validator-upgrade::half-validator-upgrade : 5341 TPS, 4910 ms latency, 7400 ms p99 latency,no expired txns
4. upgrading second batch to new version: 14bd8a196d6eeadb5af78b508632ee7a5c485f0a
compatibility::simple-validator-upgrade::rest-validator-upgrade : 5850 TPS, 4344 ms latency, 7700 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet ==> 14bd8a196d6eeadb5af78b508632ee7a5c485f0a passed
Test Ok

Forge is running suite compat on testnet ==> a44377a2a42d7561accc89e60ffa9504d41582e9

Forge is running suite land_blocking on a44377a2a42d7561accc89e60ffa9504d41582e9

:white_check_mark: Forge suite compat success on testnet ==> a44377a2a42d7561accc89e60ffa9504d41582e9

Compatibility test results for testnet ==> a44377a2a42d7561accc89e60ffa9504d41582e9 (PR)
1. Check liveness of validators at old version: testnet
compatibility::simple-validator-upgrade::liveness-check : 7099 TPS, 3867 ms latency, 6000 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: a44377a2a42d7561accc89e60ffa9504d41582e9
compatibility::simple-validator-upgrade::single-validator-upgrade : 5056 TPS, 4983 ms latency, 8700 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: a44377a2a42d7561accc89e60ffa9504d41582e9
compatibility::simple-validator-upgrade::half-validator-upgrade : 4987 TPS, 5438 ms latency, 7700 ms p99 latency,no expired txns
4. upgrading second batch to new version: a44377a2a42d7561accc89e60ffa9504d41582e9
compatibility::simple-validator-upgrade::rest-validator-upgrade : 6598 TPS, 3917 ms latency, 6900 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet ==> a44377a2a42d7561accc89e60ffa9504d41582e9 passed
Test Ok

:x: Forge suite land_blocking failure on a44377a2a42d7561accc89e60ffa9504d41582e9

performance benchmark with full nodes : 7612 TPS, 3893 ms latency, 5400 ms p99 latency,no expired txns
Test Failed: Failed chain progress check. Max round gap was 1 [limit 4] at version 1262002. Max no progress secs was 1.87556 [limit 10] at version 60584.
Trailing Log Lines:
{"level":"INFO","source":{"package":"testcases","file":"testsuite/testcases/src/lib.rs:292"},"thread_name":"main","hostname":"forge-e2e-pr-4144-1663051540-a44377a2a42d7561accc89e60ffa9504d4","timestamp":"2022-09-13T07:00:21.844149Z","message":"Cooldown stats: submitted: 7458 txn/s, committed: 7458 txn/s, expired: 0 txn/s, failed submission: 0 tnx/s, latency: 4185 ms, (p50: 4200 ms, p90: 5400 ms, p99: 6000 ms), latency samples: 179000"}
{"level":"INFO","source":{"package":"forge","file":"testsuite/forge/src/interface/swarm.rs:310"},"thread_name":"main","hostname":"forge-e2e-pr-4144-1663051540-a44377a2a42d7561accc89e60ffa9504d4","timestamp":"2022-09-13T07:00:22.581730Z","message":"All nodes caught up successfully in 0s"}
Fetching 0 to 2724 sequence number, wanting epochs [2, 5), last version: 3656335 and epoch: 4
::error::Failed chain progress check. Max round gap was 1 [limit 4] at version 1262002. Max no progress secs was 1.87556 [limit 10] at version 60584.
test performance benchmark with full nodes ... FAILED
Error: Failed chain progress check. Max round gap was 1 [limit 4] at version 1262002. Max no progress secs was 1.87556 [limit 10] at version 60584.
Test Statistics: 
performance benchmark with full nodes : 7612 TPS, 3893 ms latency, 5400 ms p99 latency,no expired txns
Test Failed: Failed chain progress check. Max round gap was 1 [limit 4] at version 1262002. Max no progress secs was 1.87556 [limit 10] at version 60584.


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:260"},"thread_name":"main","hostname":"forge-e2e-pr-4144-1663051540-a44377a2a42d7561accc89e60ffa9504d4","timestamp":"2022-09-13T07:00:22.874831Z","message":"Deleting namespace forge-e2e-pr-4144: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:368"},"thread_name":"main","hostname":"forge-e2e-pr-4144-1663051540-a44377a2a42d7561accc89e60ffa9504d4","timestamp":"2022-09-13T07:00:22.874861Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-4144"}

failures:
    performance benchmark with full nodes

Failed to run tests:
Tests Failed
test result: FAILED. 0 passed; 1 failed; 0 filtered out

Error: Tests Failed
Debugging output:

Forge is running suite compat on testnet ==> 38cfecc71cab0bc349171d593d219249ba89d86b

Forge is running suite land_blocking on 38cfecc71cab0bc349171d593d219249ba89d86b

:white_check_mark: Forge suite land_blocking success on 38cfecc71cab0bc349171d593d219249ba89d86b

performance benchmark with full nodes : 7715 TPS, 3853 ms latency, 5700 ms p99 latency,no expired txns
Test Ok

:white_check_mark: Forge suite compat success on testnet ==> 38cfecc71cab0bc349171d593d219249ba89d86b

Compatibility test results for testnet ==> 38cfecc71cab0bc349171d593d219249ba89d86b (PR)
1. Check liveness of validators at old version: testnet
compatibility::simple-validator-upgrade::liveness-check : 6894 TPS, 3892 ms latency, 6500 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 38cfecc71cab0bc349171d593d219249ba89d86b
compatibility::simple-validator-upgrade::single-validator-upgrade : 5566 TPS, 4773 ms latency, 6800 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 38cfecc71cab0bc349171d593d219249ba89d86b
compatibility::simple-validator-upgrade::half-validator-upgrade : 5733 TPS, 4517 ms latency, 6700 ms p99 latency,no expired txns
4. upgrading second batch to new version: 38cfecc71cab0bc349171d593d219249ba89d86b
compatibility::simple-validator-upgrade::rest-validator-upgrade : 6757 TPS, 4145 ms latency, 7400 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet ==> 38cfecc71cab0bc349171d593d219249ba89d86b passed
Test Ok

github-actions[bot] avatar Sep 13 '22 07:09 github-actions[bot]

changed catchup to wait for the next version, to make sure network is not making progress any more.

minor - moved the new code to a function.

igor-aptos avatar Sep 16 '22 20:09 igor-aptos

Forge is running suite land_blocking on de0e80ad3d89be225208d58997834c7839a27d3a

github-actions[bot] avatar Sep 19 '22 17:09 github-actions[bot]

:white_check_mark: Forge suite land_blocking success on de0e80ad3d89be225208d58997834c7839a27d3a

performance benchmark with full nodes : 7685 TPS, 5171 ms latency, 8700 ms p99 latency,no expired txns
Test Ok

github-actions[bot] avatar Sep 19 '22 18:09 github-actions[bot]

Forge is running suite land_blocking on 8a12341e15020ed8d5a120ea3676d47d74ef529a

github-actions[bot] avatar Sep 19 '22 21:09 github-actions[bot]

Forge is running suite compat on 843b204dce971d98449b82624f4f684c7a18b991 ==> 8a12341e15020ed8d5a120ea3676d47d74ef529a

github-actions[bot] avatar Sep 19 '22 21:09 github-actions[bot]

:white_check_mark: Forge suite land_blocking success on 8a12341e15020ed8d5a120ea3676d47d74ef529a

performance benchmark with full nodes : 7571 TPS, 5229 ms latency, 7800 ms p99 latency,(!) expired 260 out of 3271300 txns
Test Ok

github-actions[bot] avatar Sep 19 '22 21:09 github-actions[bot]

:white_check_mark: Forge suite compat success on 843b204dce971d98449b82624f4f684c7a18b991 ==> 8a12341e15020ed8d5a120ea3676d47d74ef529a

Compatibility test results for 843b204dce971d98449b82624f4f684c7a18b991 ==> 8a12341e15020ed8d5a120ea3676d47d74ef529a (PR)
1. Check liveness of validators at old version: 843b204dce971d98449b82624f4f684c7a18b991
compatibility::simple-validator-upgrade::liveness-check : 7509 TPS, 4911 ms latency, 7300 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 8a12341e15020ed8d5a120ea3676d47d74ef529a
compatibility::simple-validator-upgrade::single-validator-upgrade : 6218 TPS, 6033 ms latency, 7300 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 8a12341e15020ed8d5a120ea3676d47d74ef529a
compatibility::simple-validator-upgrade::half-validator-upgrade : 4251 TPS, 8439 ms latency, 16600 ms p99 latency,no expired txns
4. upgrading second batch to new version: 8a12341e15020ed8d5a120ea3676d47d74ef529a
compatibility::simple-validator-upgrade::rest-validator-upgrade : 7658 TPS, 4721 ms latency, 8000 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for 843b204dce971d98449b82624f4f684c7a18b991 ==> 8a12341e15020ed8d5a120ea3676d47d74ef529a passed
Test Ok

github-actions[bot] avatar Sep 19 '22 21:09 github-actions[bot]