aptos-core icon indicating copy to clipboard operation
aptos-core copied to clipboard

[DRAFT] quorum-store-forge-onchain

Open bchocho opened this issue 2 years ago • 4 comments

  • QuorumStore all squash
  • unused import
  • caching data in data manager
  • Sematically fix rebase after #2049
  • BatchAggregator refactor, logic fixes, Serialized txn in fragments
  • [smoke-tests] fix flakiness of a couple tests
  • debug messages
  • shutdown and some fixes (#2325)
  • Unit tests (#2576)
  • QuorumStore all squash
  • unused import
  • caching data in data manager
  • Sematically fix rebase after #2049
  • BatchAggregator refactor, logic fixes, Serialized txn in fragments
  • debug messages
  • [smoke-tests] fix flakiness of a couple tests
  • shutdown and some fixes (#2325)
  • Unit tests (#2576)
  • Compiles
  • QuorumStore all squash
  • unused import
  • caching data in data manager
  • Sematically fix rebase after #2049
  • BatchAggregator refactor, logic fixes, Serialized txn in fragments
  • debug messages
  • [smoke-tests] fix flakiness of a couple tests
  • shutdown and some fixes (#2325)
  • Unit tests (#2576)
  • quorumStore all squash
  • unused import
  • caching data in data manager
  • Sematically fix rebase after #2049
  • BatchAggregator refactor, logic fixes, Serialized txn in fragments
  • [smoke-tests] fix flakiness of a couple tests
  • shutdown and some fixes (#2325)
  • Unit tests (#2576)
  • Compiles
  • fix compilation after rebase
  • fmt
  • bug fix
  • fix bug proof builder race
  • multi_sig
  • debug msg
  • fmt
  • bug order msg fix
  • comiler errors
  • remove huge debug message
  • compiler error after marge
  • Add consensus fault tolerance test
  • debugging
  • cargo.lock
  • re-oredering bug fix
  • timer parm
  • fmt plus compiler after merge
  • fix build error (#3358)
  • Data manager no remote
  • minor debug
  • debug
  • add basic qs counters
  • fix
  • add latency counters
  • fix
  • nit
  • all validators receiving txns
  • try fix txn expiration
  • counters
  • tune parameters
  • fix counters
  • fix build
  • fix counters
  • nit
  • try with different parameters
  • ensure batch_store shutdown before proof_builder
  • just try larger channel size
  • fix counter
  • more counters
  • change parameters
  • spawn_named some quorum stuff
  • Adding monitoring to stdout otput every 5 sec
  • a monitoring bug fix
  • DO NOT MERGE THIS
  • re-enabling console port
  • fix build
  • multiple network worker
  • bug fix
  • fix warning and change parameters
  • exp with 1 worker
  • counter change
  • add shutdown to network_listener, add epoch number to profiling infp
  • add epoch numbers for tokio-console info
  • print txn status, use 7k txn input rate
  • counter
  • counters
  • fix
  • batch counters fix
  • change back to close loop forge, comment something from QS DB
  • try different network workers for different QS message types
  • add end_batch counter
  • comment out expiration round check just to see performance
  • test 1 network worker
  • back to multiple network workers, bring back QC DB
  • try parameters
  • parameters
  • disable tokio-console
  • parameter
  • parameter
  • change run time
  • nit
  • parameter
  • order PoS for consensus
  • fmt
  • try parameter
  • run forge with different tps
  • trucking num bytes in PoS plus security bug fix
  • try standard forge
  • run limited bw test
  • bw test
  • 3 region test
  • 3 region test with 100 nodes
  • 3 region with 100 nodes and 100mbps BW
  • 0 memory quota test
  • 3 region 100nodes test with 0 memory quota
  • Differ verifying batch respond to batch requester
  • 3 region 200 nodes
  • 300 nodes
  • debug print for batch out of order
  • debug prints
  • try with larger timeouts
  • try parameters
  • no epoch change
  • debug
  • update logicaltime for quorum store when state sync
  • fix
  • temp fix just to see performance
  • quick fix
  • update logical time differently
  • test fix
  • revert change for testing
  • debug
  • 3 region 100 nodes
  • debug info
  • normal forge
  • epoch change may contain nonincreasing rounds
  • fix
  • parameter
  • revert
  • fix batch_reader race between expiry extension and clear, add test
  • counter
  • add back notify commit during state sync
  • nit
  • revert for debug
  • fix
  • 100 nodes
  • low load test
  • nit
  • 100nodes low tps test
  • run high load test
  • add 2 sec to expiration limit when pull from mempool, for high load test
  • add 10 sec to mempool expiration
  • add back pressure to quorum sotre
  • fix
  • parameter
  • parameter
  • test normal load
  • revert mempool
  • close loop
  • revert backpressure for testing
  • parameter with backpressure
  • end batch when QS is back pressured
  • parameter
  • 3 region 100 nodes with backpressure
  • QS backpressure only when consensus also backpressure
  • test with larger block size
  • parameter and counters
  • block size
  • 200 nodes test
  • counter
  • larger block size
  • close loop
  • open loop 8k
  • nit
  • close loop mempool 60k
  • change backpressure
  • change PoS queue
  • close loop
  • larger parameters
  • test 20 nodes
  • separate config file
  • 100nodes
  • hacky way to check if steaming is needed
  • 100 nodes test
  • parameters
  • smaller block to remove expiration
  • add batch counts, fix unit test, clean up
  • merge DataMenagers
  • fix
  • fix potential panic
  • parameter and revert pos queue
  • test perf without fragment
  • use fragments, 8k open loop
  • fix build
  • proper fix
  • fix forge
  • fix a warning
  • fix test
  • small fix for non-qs
  • Fix Cargo.lock from main
  • remove unnecessary changes in lib.rs
  • Fix unit test build
  • quick fix for twins tests with quorum store off - need a better strategy for when quorum store is enabled
  • Make quorum store shutdown faster on epoch change
  • cleanup temp test files
  • revert changes in testsuite (forge-related changes)
  • [DONOTLAND] hard-code quorum store
  • Fix CleanRequest?
  • submitting txn to all nodes
  • 3 region test
  • fix
  • Revert mempool expiration change (that fails tests)
  • Turn QS off to get baseline
  • revert to QS
  • fmt
  • Create Batch Generator
  • Update changes to compile
  • batch_coordinator and quorum_store_coordinator
  • Add quorum_store_builder
  • cleanup some build warnings
  • checkpoint - mostly done?
  • passes most smoke tests
  • fix unit test compile
  • remove unnecessary changes as prep for merge
  • some cleanup
  • Change tokio::spawn to spawn_named
  • Some cleanup, no more warnings in cargo build
  • resolve panics during shutdown
  • add simple counter for QS backpressure, and change QS backpressure to be dependent on local unexpired remaining proofs
  • more counters and simple counter fix
  • push backpressure from proof_manager to batch_generator
  • adding latest consensus back pressure when creating QS batch
  • update QS backpressure upon every commit
  • backpressure optimization and add counters
  • small fix
  • counter
  • counters and parameters
  • counter
  • Create multiple batch_coordinators and a single network listener
  • some cleanup for batch coordinator creation
  • more cleanup
  • fix some lints
  • fix some lint issues
  • cargo +nightly fmt
  • fix cargo sort
  • some more cleanup
  • wrap in a box
  • Reset quorum store enabled to false in ConsensusConfigV1 for landing on main.
  • More cleanup
  • tiny cleanup
  • More small cleanup
  • cargo fmt
  • Revert forge changes and more cleanup and formatting
  • Move quorum_store shutdown to the end, to avoid any new incoming requests
  • Turn quorum store back on for sanity check of cleanup
  • proof_manager tests
  • batch_generator tests
  • remove unneeded config
  • some more cleanup
  • remove back pressure updates on proposal
  • logic fix for backpressure proof cleanning
  • more cleanup: remove all println
  • linter
  • unit test: check proof_manager_rx
  • add a fail_point test
  • Remove unused Result in init_proof
  • Add smoke test that flips quorum store to enabled
  • [DRAFT]

Description

Test Plan

bchocho avatar Jan 25 '23 14:01 bchocho

Forge is running suite compat on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 381ad554dda6869d7f17dc179ef5ed12e81bb2cd

github-actions[bot] avatar Jan 25 '23 14:01 github-actions[bot]

Forge is running suite land_blocking on 381ad554dda6869d7f17dc179ef5ed12e81bb2cd

github-actions[bot] avatar Jan 25 '23 14:01 github-actions[bot]

:x: Forge suite land_blocking failure on 381ad554dda6869d7f17dc179ef5ed12e81bb2cd

Forge test runner terminated:
Trailing Log Lines:
{"level":"INFO","source":{"package":"aptos_transaction_emitter_lib","file":"crates/transaction-emitter-lib/src/emitter/account_minter.rs:227"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674658698-381ad554dda6869d7f17dc179ef5ed12e8","timestamp":"2023-01-25T15:08:22.225928Z","message":"Successfully completed creating accounts, had to retry 0 transactions"}
{"level":"INFO","source":{"package":"aptos_transaction_emitter_lib","file":"crates/transaction-emitter-lib/src/emitter/mod.rs:485"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674658698-381ad554dda6869d7f17dc179ef5ed12e8","timestamp":"2023-01-25T15:08:22.226134Z","message":"Checking account sequence and counting latency for 96 out of 480 total_workers"}
{"level":"INFO","source":{"package":"aptos_transaction_emitter_lib","file":"crates/transaction-emitter-lib/src/emitter/mod.rs:516"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674658698-381ad554dda6869d7f17dc179ef5ed12e8","timestamp":"2023-01-25T15:08:22.237814Z","message":"Tx emitter workers started"}
{"level":"INFO","source":{"package":"aptos_testcases","file":"testsuite/testcases/src/lib.rs:222"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674658698-381ad554dda6869d7f17dc179ef5ed12e8","timestamp":"2023-01-25T15:08:22.237838Z","message":"Starting emitting txns for 480s"}
{"level":"INFO","source":{"package":"aptos_testcases","file":"testsuite/testcases/src/lib.rs:225"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674658698-381ad554dda6869d7f17dc179ef5ed12e8","timestamp":"2023-01-25T15:08:55.837911Z","message":"33s warmup finished"}
Compiling, may take a little while to download git dependencies...
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UnexpectedError("Unable to resolve packages for package 'RunScript'")', testsuite/testcases/src/enable_quorum_store_test.rs:97:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:280"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674658698-381ad554dda6869d7f17dc179ef5ed12e8","timestamp":"2023-01-25T15:08:55.884919Z","message":"Deleting namespace forge-e2e-pr-6321: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:388"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674658698-381ad554dda6869d7f17dc179ef5ed12e8","timestamp":"2023-01-25T15:08:55.884955Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-6321"}
Debugging output:
NAME                                    READY   STATUS      RESTARTS   AGE
aptos-node-0-validator-0                1/1     Running     0          5m10s
aptos-node-1-validator-0                1/1     Running     0          5m10s
aptos-node-2-validator-0                1/1     Running     0          5m10s
aptos-node-3-validator-0                1/1     Running     0          5m10s
genesis-aptos-genesis-eforge200-ftl2b   0/1     Completed   0          8m3s

github-actions[bot] avatar Jan 25 '23 15:01 github-actions[bot]

:x: Forge suite compat failure on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 381ad554dda6869d7f17dc179ef5ed12e81bb2cd

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 381ad554dda6869d7f17dc179ef5ed12e81bb2cd (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7515 TPS, 5148 ms latency, 7000 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 381ad554dda6869d7f17dc179ef5ed12e81bb2cd
Test Failed: Failed to wait for transactions: Unknown(Transaction expired. It is guaranteed it will not be committed on chain.)
Trailing Log Lines:
::error::Failed to wait for transactions: Unknown(Transaction expired. It is guaranteed it will not be committed on chain.)
test compatibility::simple-validator-upgrade ... FAILED
Error: Failed to wait for transactions: Unknown(Transaction expired. It is guaranteed it will not be committed on chain.)
Test Statistics: 
Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 381ad554dda6869d7f17dc179ef5ed12e81bb2cd (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7515 TPS, 5148 ms latency, 7000 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 381ad554dda6869d7f17dc179ef5ed12e81bb2cd
Test Failed: Failed to wait for transactions: Unknown(Transaction expired. It is guaranteed it will not be committed on chain.)


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:280"},"thread_name":"main","hostname":"forge-compat-pr-6321-1674658696-testnet-2d8b1b57553d869190f61df","timestamp":"2023-01-25T15:09:55.250517Z","message":"Deleting namespace forge-compat-pr-6321: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:388"},"thread_name":"main","hostname":"forge-compat-pr-6321-1674658696-testnet-2d8b1b57553d869190f61df","timestamp":"2023-01-25T15:09:55.250548Z","message":"aptos-node resources for Forge removed in namespace: forge-compat-pr-6321"}

failures:
    compatibility::simple-validator-upgrade

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed
Debugging output:
NAME                                    READY   STATUS             RESTARTS      AGE
aptos-node-0-validator-0                1/1     Running            0             9m3s
aptos-node-1-validator-0                0/1     CrashLoopBackOff   4 (34s ago)   4m35s
aptos-node-2-validator-0                1/1     Running            0             9m3s
aptos-node-3-validator-0                1/1     Running            0             9m3s
aptos-node-4-validator-0                1/1     Running            0             9m3s
genesis-aptos-genesis-eforge217-wf2d8   0/1     Completed          0             9m15s

github-actions[bot] avatar Jan 25 '23 15:01 github-actions[bot]

Forge is running suite compat on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 121916de56a2d4c0dbdaef462ac1a2d869a9b4df

github-actions[bot] avatar Jan 27 '23 01:01 github-actions[bot]

Forge is running suite land_blocking on 121916de56a2d4c0dbdaef462ac1a2d869a9b4df

github-actions[bot] avatar Jan 27 '23 01:01 github-actions[bot]

:x: Forge suite land_blocking failure on 121916de56a2d4c0dbdaef462ac1a2d869a9b4df

Forge test runner terminated:
Trailing Log Lines:
{"level":"INFO","source":{"package":"aptos_transaction_emitter_lib","file":"crates/transaction-emitter-lib/src/emitter/mod.rs:485"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674781599-121916de56a2d4c0dbdaef462ac1a2d869","timestamp":"2023-01-27T01:12:34.872479Z","message":"Checking account sequence and counting latency for 96 out of 480 total_workers"}
{"level":"INFO","source":{"package":"aptos_transaction_emitter_lib","file":"crates/transaction-emitter-lib/src/emitter/mod.rs:516"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674781599-121916de56a2d4c0dbdaef462ac1a2d869","timestamp":"2023-01-27T01:12:34.884084Z","message":"Tx emitter workers started"}
{"level":"INFO","source":{"package":"aptos_testcases","file":"testsuite/testcases/src/lib.rs:222"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674781599-121916de56a2d4c0dbdaef462ac1a2d869","timestamp":"2023-01-27T01:12:34.884110Z","message":"Starting emitting txns for 480s"}
{"level":"INFO","source":{"package":"aptos_testcases","file":"testsuite/testcases/src/lib.rs:225"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674781599-121916de56a2d4c0dbdaef462ac1a2d869","timestamp":"2023-01-27T01:13:08.484185Z","message":"33s warmup finished"}
{"level":"WARN","source":{"package":"aptos","file":"crates/aptos/src/test/mod.rs:922"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674781599-121916de56a2d4c0dbdaef462ac1a2d869","timestamp":"2023-01-27T01:13:08.495713Z","message":"BCHO aptos_framework_dir: \"/aptos/crates/aptos/../../aptos-move/framework/aptos-framework\""}
Compiling, may take a little while to download git dependencies...
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: UnexpectedError("Unable to resolve packages for package 'RunScript'")', testsuite/testcases/src/enable_quorum_store_test.rs:96:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:280"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674781599-121916de56a2d4c0dbdaef462ac1a2d869","timestamp":"2023-01-27T01:13:08.522738Z","message":"Deleting namespace forge-e2e-pr-6321: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:388"},"thread_name":"main","hostname":"forge-e2e-pr-6321-1674781599-121916de56a2d4c0dbdaef462ac1a2d869","timestamp":"2023-01-27T01:13:08.522765Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-6321"}
Debugging output:

github-actions[bot] avatar Jan 27 '23 01:01 github-actions[bot]

:x: Forge suite compat failure on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 121916de56a2d4c0dbdaef462ac1a2d869a9b4df

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 121916de56a2d4c0dbdaef462ac1a2d869a9b4df (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7160 TPS, 5509 ms latency, 8200 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 121916de56a2d4c0dbdaef462ac1a2d869a9b4df
Test Failed: Failed to wait for transactions: Unknown(Transaction expired. It is guaranteed it will not be committed on chain.)
Trailing Log Lines:
::error::Failed to wait for transactions: Unknown(Transaction expired. It is guaranteed it will not be committed on chain.)
test compatibility::simple-validator-upgrade ... FAILED
Error: Failed to wait for transactions: Unknown(Transaction expired. It is guaranteed it will not be committed on chain.)
Test Statistics: 
Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 121916de56a2d4c0dbdaef462ac1a2d869a9b4df (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7160 TPS, 5509 ms latency, 8200 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 121916de56a2d4c0dbdaef462ac1a2d869a9b4df
Test Failed: Failed to wait for transactions: Unknown(Transaction expired. It is guaranteed it will not be committed on chain.)


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:280"},"thread_name":"main","hostname":"forge-compat-pr-6321-1674781590-testnet-2d8b1b57553d869190f61df","timestamp":"2023-01-27T01:16:50.261093Z","message":"Deleting namespace forge-compat-pr-6321: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:388"},"thread_name":"main","hostname":"forge-compat-pr-6321-1674781590-testnet-2d8b1b57553d869190f61df","timestamp":"2023-01-27T01:16:50.261125Z","message":"aptos-node resources for Forge removed in namespace: forge-compat-pr-6321"}

failures:
    compatibility::simple-validator-upgrade

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed
Debugging output:
NAME                                   READY   STATUS             RESTARTS      AGE
aptos-node-0-validator-0               1/1     Running            0             9m16s
aptos-node-1-validator-0               0/1     CrashLoopBackOff   3 (36s ago)   4m9s
aptos-node-2-validator-0               1/1     Running            0             9m16s
aptos-node-3-validator-0               1/1     Running            0             9m16s
aptos-node-4-validator-0               1/1     Running            0             9m16s
genesis-aptos-genesis-eforge89-l2zr6   0/1     Completed          0             9m27s

github-actions[bot] avatar Jan 27 '23 01:01 github-actions[bot]