[move] Fix struct depth computation concurrency bug
Description
Recently, VM became shared per-thread (#13091) for validation. As a result, an invariant violation can be triggered on struct depth checks in loader. This PR fixes that by removing the race condition when writing the newly cached depth formula.
Note: removing the paranoid check for recursive types is safe because it is already checked by the bytecode verifier.
Type of Change
- [ ] New feature
- [x] Bug fix
- [ ] Breaking change
- [ ] Performance improvement
- [ ] Refactoring
- [ ] Dependency update
- [ ] Documentation update
- [ ] Tests
Which Components or Systems Does This Change Impact?
- [ ] Validator Node
- [ ] Full Node (API, Indexer, etc.)
- [x] Move/Aptos Virtual Machine
- [ ] Aptos Framework
- [ ] Aptos CLI/SDK
- [ ] Developer Infrastructure
- [ ] Other (specify)
How Has This Been Tested?
Key Areas to Review
Checklist
- [x] I have read and followed the CONTRIBUTING doc
- [x] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [x] I identified and added all stakeholders and component owners affected by this change as reviewers
- [ ] I tested both happy and unhappy path of the functionality
- [ ] I have made corresponding changes to the documentation
⏱️ 3h 49m total CI duration on this PR
| Job | Cumulative Duration | Recent Runs |
|---|---|---|
| execution-performance / single-node-performance | 1h 12m | 🟩 🟩 🟩 |
| rust-smoke-tests | 26m | 🟥 |
| rust-targeted-unit-tests | 23m | 🟩 |
| rust-move-unit-coverage | 19m | 🟩 |
| rust-images / rust-all | 17m | 🟩 |
| forge-e2e-test / forge | 14m | 🟩 |
| forge-compat-test / forge | 11m | 🟥 |
| rust-move-tests | 11m | 🟩 |
| cli-e2e-tests / run-cli-tests | 7m | 🟩 |
| rust-lints | 6m | 🟩 |
| check | 4m | 🟩 |
| run-tests-main-branch | 4m | 🟩 |
| rust-build-cached-packages | 4m | 🟩 |
| check-dynamic-deps | 3m | 🟩 🟩 🟩 |
| general-lints | 2m | 🟩 |
| indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose | 2m | 🟥 |
| semgrep/ci | 1m | 🟩 🟩 🟩 |
| node-api-compatibility-tests / node-api-compatibility-tests | 49s | 🟩 |
| execution-performance / file_change_determinator | 32s | 🟩 🟩 🟩 |
| file_change_determinator | 12s | 🟩 |
| file_change_determinator | 10s | 🟩 |
| permission-check | 9s | 🟩 |
| file_change_determinator | 9s | 🟩 |
| permission-check | 8s | 🟩 🟩 🟩 |
| permission-check | 7s | 🟩 🟩 🟩 |
| determine-docker-build-metadata | 4s | 🟩 |
| permission-check | 3s | 🟩 |
| permission-check | 3s | 🟩 |
🚨 3 jobs on the last run were significantly faster/slower than expected
| Job | Duration | vs 7d avg | Delta |
|---|---|---|---|
| rust-targeted-unit-tests | 23m | 17m | |
| rust-images / rust-all | 17m | 13m | |
| rust-build-cached-packages | 4m | 5m |
Forge is running suite realistic_env_max_load on dde9c9f5bf92e4ee8008954ce117b518764ac83b
- Grafana dashboard (auto-refresh)
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
Forge is running suite compat on 01b24e7e3548382dd25440b39a0438a993387f12 ==> dde9c9f5bf92e4ee8008954ce117b518764ac83b
- Grafana dashboard (auto-refresh)
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 62.2%. Comparing base (
4ba801d) to head (dde9c9f). Report is 772 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #13097 +/- ##
=========================================
- Coverage 62.4% 62.2% -0.2%
=========================================
Files 830 827 -3
Lines 186498 182950 -3548
=========================================
- Hits 116389 113831 -2558
+ Misses 70109 69119 -990
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:x: Forge suite compat failure on 01b24e7e3548382dd25440b39a0438a993387f12 ==> dde9c9f5bf92e4ee8008954ce117b518764ac83b
Compatibility test results for 01b24e7e3548382dd25440b39a0438a993387f12 ==> dde9c9f5bf92e4ee8008954ce117b518764ac83b (PR)
1. Check liveness of validators at old version: 01b24e7e3548382dd25440b39a0438a993387f12
compatibility::simple-validator-upgrade::liveness-check : committed: 6300 txn/s, latency: 5067 ms, (p50: 4800 ms, p90: 9200 ms, p99: 10500 ms), latency samples: 239400
2. Upgrading first Validator to new version: dde9c9f5bf92e4ee8008954ce117b518764ac83b
Test Failed: Tried executing 10 txns, request counters: "success 0, failed submit [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10], failed wait [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10], by client: [(0, 180, 180): http://aptos-node-1-validator.forge-compat-pr-13097.svc:8080/v1/]"
Caused by:
Unknown error Transaction expired, without being seen in mempool. It is guaranteed it will not be committed on chain.
Stack backtrace:
0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:565:25
1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/result.rs:1963:27
2: aptos_transaction_emitter_lib::emitter::transaction_executor::RestApiReliableTransactionSubmitter::submit_check_and_retry::{{closure}}
at ./crates/transaction-emitter-lib/src/emitter/transaction_executor.rs:127:28
3: <futures_util::future::maybe_done::MaybeDone<Fut> as core::future::future::Future>::poll
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/future/maybe_done.rs:95:38
4: <futures_util::future::join_all::JoinAll<F> as core::future::future::Future>::poll
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.30/src/future/join_all.rs:143:24
5: <aptos_transaction_emitter_lib::emitter::transaction_executor::RestApiReliableTransactionSubmitter as aptos_transaction_generator_lib::ReliableTransactionSubmitter>::execute_transactions_with_counter::{{closure}}
at ./crates/transaction-emitter-lib/src/emitter/transaction_executor.rs:309:10
6: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/future/future.rs:125:9
7: aptos_transaction_emitter_lib::emitter::account_minter::AccountMinter::create_and_fund_seed_accounts::{{closure}}
at ./crates/transaction-emitter-lib/src/emitter/account_minter.rs:434:18
8: aptos_transaction_emitter_lib::emitter::account_minter::AccountMinter::create_and_fund_accounts::{{closure}}
at ./crates/transaction-emitter-lib/src/emitter/account_minter.rs:326:14
9: aptos_transaction_emitter_lib::emitter::create_accounts::{{closure}}
at ./crates/transaction-emitter-lib/src/emitter/mod.rs:1195:14
10: aptos_transaction_emitter_lib::emitter::TxnEmitter::start_job::{{closure}}
at ./crates/transaction-emitter-lib/src/emitter/mod.rs:717:10
11: aptos_transaction_emitter_lib::emitter::TxnEmitter::emit_txn_for_impl::{{closure}}
at ./crates/transaction-emitter-lib/src/emitter/mod.rs:827:14
12: aptos_transaction_emitter_lib::emitter::TxnEmitter::emit_txn_for::{{closure}}
at ./crates/transaction-emitter-lib/src/emitter/mod.rs:859:14
13: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:63
14: tokio::runtime::coop::with_budget
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
15: tokio::runtime::coop::budget
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
16: tokio::runtime::park::CachedParkThread::block_on
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:31
17: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/blocking.rs:66:9
18: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:87:13
19: tokio::runtime::context::runtime::enter_runtime
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
20: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:86:9
21: tokio::runtime::runtime::Runtime::block_on
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/runtime.rs:350:50
22: aptos_testcases::generate_traffic
at ./testsuite/testcases/src/lib.rs:105:17
23: <aptos_testcases::compatibility_test::SimpleValidatorUpgrade as aptos_forge::interface::network::NetworkTest>::run
at ./testsuite/testcases/src/compatibility_test.rs:83:24
24: aptos_forge::runner::Forge<F>::run::{{closure}}
at ./testsuite/forge/src/runner.rs:598:42
25: aptos_forge::runner::run_test
at ./testsuite/forge/src/runner.rs:666:11
26: aptos_forge::runner::Forge<F>::run
at ./testsuite/forge/src/runner.rs:598:30
27: forge::run_forge
at ./testsuite/forge-cli/src/main.rs:427:11
28: forge::main
at ./testsuite/forge-cli/src/main.rs:353:21
29: core::ops::function::FnOnce::call_once
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
30: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:154:18
31: std::rt::lang_start::{{closure}}
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:167:18
32: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:284:13
33: std::panicking::try::do_call
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
34: std::panicking::try
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
35: std::panic::catch_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
36: std::rt::lang_start_internal::{{closure}}
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:48
37: std::panicking::try::do_call
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
38: std::panicking::try
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
39: std::panic::catch_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
40: std::rt::lang_start_internal
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
41: std::rt::lang_start
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
42: __libc_start_main
43: _start
Trailing Log Lines:
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
39: std::panic::catch_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
40: std::rt::lang_start_internal
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
41: std::rt::lang_start
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
42: __libc_start_main
43: _start
Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-compat-pr-13097-1714384034-01b24e7e3548382dd25440b39a0438","timestamp":"2024-04-29T09:56:43.349378Z","message":"Deleting namespace forge-compat-pr-13097: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-compat-pr-13097-1714384034-01b24e7e3548382dd25440b39a0438","timestamp":"2024-04-29T09:56:43.349406Z","message":"aptos-node resources for Forge removed in namespace: forge-compat-pr-13097"}
failures:
compatibility::simple-validator-upgrade
test result: FAILED. 0 passed; 1 failed; 0 filtered out
Failed to run tests:
Tests Failed
Error: Tests Failed
Stack backtrace:
0: anyhow::error::<impl anyhow::Error>::msg
at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
1: aptos_forge::runner::Forge<F>::run
at ./testsuite/forge/src/runner.rs:618:13
2: forge::run_forge
at ./testsuite/forge-cli/src/main.rs:427:11
3: forge::main
at ./testsuite/forge-cli/src/main.rs:353:21
4: core::ops::function::FnOnce::call_once
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
5: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:154:18
6: std::rt::lang_start::{{closure}}
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:167:18
7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:284:13
8: std::panicking::try::do_call
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
9: std::panicking::try
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
10: std::panic::catch_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
11: std::rt::lang_start_internal::{{closure}}
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:48
12: std::panicking::try::do_call
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
13: std::panicking::try
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
14: std::panic::catch_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
15: std::rt::lang_start_internal
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
16: std::rt::lang_start
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
17: __libc_start_main
18: _start
Debugging output:
NAME READY STATUS RESTARTS AGE
aptos-node-0-validator-0 1/1 Running 0 6m46s
aptos-node-1-validator-0 1/1 Running 0 3m40s
aptos-node-2-validator-0 1/1 Running 0 6m46s
aptos-node-3-validator-0 1/1 Running 0 6m46s
genesis-aptos-genesis-eforge103-q482v 0/1 Completed 0 7m16s
- Grafana dashboard
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
:white_check_mark: Forge suite realistic_env_max_load success on dde9c9f5bf92e4ee8008954ce117b518764ac83b
two traffics test: inner traffic : committed: 8368 txn/s, latency: 4691 ms, (p50: 4500 ms, p90: 5400 ms, p99: 9900 ms), latency samples: 3606880
two traffics test : committed: 100 txn/s, latency: 1890 ms, (p50: 1800 ms, p90: 2100 ms, p99: 6100 ms), latency samples: 1740
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.208, avg: 0.202", "QsPosToProposal: max: 0.235, avg: 0.217", "ConsensusProposalToOrdered: max: 0.427, avg: 0.404", "ConsensusOrderedToCommit: max: 0.386, avg: 0.370", "ConsensusProposalToCommit: max: 0.785, avg: 0.773"]
Max round gap was 1 [limit 4] at version 1773779. Max no progress secs was 4.863372 [limit 15] at version 1773779.
Test Ok
- Grafana dashboard
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
This issue is stale because it has been open 45 days with no activity. Remove the stale label, comment or push a commit - otherwise this will be closed in 15 days.
This issue is stale because it has been open 45 days with no activity. Remove the stale label, comment or push a commit - otherwise this will be closed in 15 days.
This issue is stale because it has been open 45 days with no activity. Remove the stale label, comment or push a commit - otherwise this will be closed in 15 days.