aptos-core
aptos-core copied to clipboard
[compiler-v2] Fix evaluation order to be consistent in language version >= 2
Description
Major changes in this PR:
- we showcase that the evaluation order used by compiler v1 in the presence of sequences intermixed with binary operators is hard to explain and understand
- thus, we make the decision in compiler v2 to:
- report a compiler error if language version is below 2.0 and there could be divergence in semantics of the program (trying to simulate the exact v1 behavior is error-prone due to a variety of reasons); this should have no false negatives, but could have false positives which in some cases can be addressed with additional complexity
- diverge the semantics (breaking change) if the language version is >= 2.0, by having consistent left-to-right evaluation order throughout, for operators and functions alike
- fixes a problem with non-binary operators and evaluation order, such as for vector pack instructions (eg, this example is now fixed: https://github.com/aptos-labs/aptos-core/issues/11447#issuecomment-2008278801)
Minor changes in this PR:
- transactional tests now have include/exclude options in test configs
- binary operator to string conversion is canonicalized and code elsewhere is updated to use this, causing minor changes in test outputs
- code duplication reducing refactors
Evaluation order in compiler v1
Consider the following examples to see some samples of evaluation order used by compiler v1 in the presence of sequences within the context of binary operations. They are meant to showcase how concisely describing the v1 ordering is hard (as opposed to, a left-to-right evaluation ordering everywhere).
These are also present at the module documentation for third_party/move/move-compiler-v2/src/seqs_in_binop_checker.rs. I am re-presenting them for an overview to bolster my argument about decisions made in this PR.
We number the sub-expressions in their order of their evaluation. Some (sub-)expressions are left un-numbered if they are irrelevant to the understanding of the evaluation order.
case 1: add is a user-defined function.
let x = 1;
add({x = x - 1; x + 8}, {x = x + 3; x - 3}) + {x = x * 2; x * 2}
^^^^^^^^^ ^^^^^ ^^^^^^^^^ ^^^^^ ^^^^^^^^^ ^^^^^
| | | | | |
| | | | | |
1 | | | | |
2 | | | |
3 | | |
| 4 |
5 |
6
case 2:
fun aborter(x: u64): u64 {
abort x
}
public fun test(): u64 {
let x = 1;
aborter(x) + {x = x + 1; aborter(x + 100); x} + x
^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^^^^^^
| | |
| 1 |
| 2
never evaluated
}
case 3:
(abort 0) + {(abort 14); 0} + 0
^^^^^^^ ^^^^^^^^
| |
1 |
never evaluated
case 4:
{250u8 + 50u8} + {abort 55; 5u8}
^^^^^^^^^^^^ ^^^^^^^^
| |
| 1
never evaluated
case 5:
let x = 1;
x + {x = x + 1; x} + {x = x + 1; x}
^ ^^^^^^^^^ ^ ^^^^^^^^^ ^
| | | | |
| 1 | | |
| | 2 |
3 3 3
Type of Change
- [x] Bug fix
- [x] Breaking change
Which Components or Systems Does This Change Impact?
- [x] Move/Aptos Virtual Machine
How Has This Been Tested?
- added around 52 new tests that expose subtle semantic behavior and cover previously untested parts of the language
- in each case, I have manually verified that whenever there is a divergence in semantics (test outputs differ) between v1 and v2 (with lang v2), we report an error when using "compiler v2 + lang v1"
- previous tests pass, one affected test has been moved to a new folder, many baselines have been updated but verified
⏱️ 6h 33m total CI duration on this PR
| Job | Cumulative Duration | Recent Runs |
|---|---|---|
| rust-move-tests | 1h 44m | 🟥 🟥 🟥 🟥 🟥 (+4 more) |
| rust-targeted-unit-tests | 1h 39m | 🟥 ⬜ 🟥 🟩 🟩 (+1 more) |
| rust-move-unit-coverage | 1h 36m | 🟩 ⬜ 🟩 🟩 🟩 (+1 more) |
| rust-lints | 43m | 🟥 🟥 🟩 🟩 🟩 (+1 more) |
| run-tests-main-branch | 25m | 🟩 🟩 🟩 🟩 🟩 (+1 more) |
| check-dynamic-deps | 11m | 🟩 🟩 🟩 🟩 🟩 (+1 more) |
| general-lints | 9m | 🟩 🟩 🟩 🟩 🟩 (+1 more) |
| semgrep/ci | 2m | 🟩 🟩 🟩 🟩 🟩 (+1 more) |
| file_change_determinator | 1m | 🟩 🟩 🟩 🟩 🟩 (+1 more) |
| file_change_determinator | 1m | 🟩 🟩 🟩 🟩 🟩 (+1 more) |
| permission-check | 18s | 🟩 🟩 🟩 🟩 🟩 |
| permission-check | 13s | 🟩 🟩 🟩 🟩 🟩 |
| permission-check | 13s | 🟩 🟩 🟩 🟩 🟩 |
| permission-check | 13s | 🟩 🟩 🟩 🟩 🟩 |
🚨 1 job on the last run was significantly faster/slower than expected
| Job | Duration | vs 7d avg | Delta |
|---|---|---|---|
| rust-lints | 5m | 7m |
~~Marking this as draft to handle some false positives in "Aptos Move Test" tests in CI.~~
This PR is now ready for review.
Codecov Report
Attention: Patch coverage is 94.04762% with 10 lines in your changes are missing coverage. Please review.
Project coverage is 57.6%. Comparing base (
1af48e9) to head (12ed7ed). Report is 3 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| third_party/move/move-model/src/ast.rs | 75.6% | 9 Missing :warning: |
| ...piler-v2/src/env_pipeline/seqs_in_binop_checker.rs | 99.0% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #12936 +/- ##
========================================
Coverage 57.5% 57.6%
========================================
Files 833 834 +1
Lines 198121 198257 +136
========================================
+ Hits 113999 114212 +213
+ Misses 84122 84045 -77
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@brmataptos @wrwg PTAL, I believe I have addressed your comments.
Forge is running suite realistic_env_max_load on 12ed7ed454935a00902d8a31d795e779997f6ebb
- Grafana dashboard (auto-refresh)
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
Forge is running suite compat on 01b24e7e3548382dd25440b39a0438a993387f12 ==> 12ed7ed454935a00902d8a31d795e779997f6ebb
- Grafana dashboard (auto-refresh)
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
Forge is running suite framework_upgrade on 01b24e7e3548382dd25440b39a0438a993387f12 ==> 12ed7ed454935a00902d8a31d795e779997f6ebb
- Grafana dashboard (auto-refresh)
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
:white_check_mark: Forge suite compat success on 01b24e7e3548382dd25440b39a0438a993387f12 ==> 12ed7ed454935a00902d8a31d795e779997f6ebb
Compatibility test results for 01b24e7e3548382dd25440b39a0438a993387f12 ==> 12ed7ed454935a00902d8a31d795e779997f6ebb (PR)
1. Check liveness of validators at old version: 01b24e7e3548382dd25440b39a0438a993387f12
compatibility::simple-validator-upgrade::liveness-check : committed: 6197 txn/s, latency: 5166 ms, (p50: 4800 ms, p90: 9600 ms, p99: 13900 ms), latency samples: 229300
2. Upgrading first Validator to new version: 12ed7ed454935a00902d8a31d795e779997f6ebb
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1359 txn/s, latency: 21508 ms, (p50: 22300 ms, p90: 25900 ms, p99: 28800 ms), latency samples: 72040
3. Upgrading rest of first batch to new version: 12ed7ed454935a00902d8a31d795e779997f6ebb
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1663 txn/s, latency: 16639 ms, (p50: 19200 ms, p90: 21800 ms, p99: 22600 ms), latency samples: 86480
4. upgrading second batch to new version: 12ed7ed454935a00902d8a31d795e779997f6ebb
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3423 txn/s, latency: 9144 ms, (p50: 9900 ms, p90: 12300 ms, p99: 12600 ms), latency samples: 136920
5. check swarm health
Compatibility test for 01b24e7e3548382dd25440b39a0438a993387f12 ==> 12ed7ed454935a00902d8a31d795e779997f6ebb passed
Test Ok
- Grafana dashboard
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
:white_check_mark: Forge suite realistic_env_max_load success on 12ed7ed454935a00902d8a31d795e779997f6ebb
two traffics test: inner traffic : committed: 7984 txn/s, latency: 4919 ms, (p50: 4800 ms, p90: 5700 ms, p99: 10500 ms), latency samples: 3449480
two traffics test : committed: 100 txn/s, latency: 1827 ms, (p50: 1800 ms, p90: 2100 ms, p99: 2300 ms), latency samples: 1780
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.206, avg: 0.202", "QsPosToProposal: max: 0.240, avg: 0.226", "ConsensusProposalToOrdered: max: 0.451, avg: 0.408", "ConsensusOrderedToCommit: max: 0.380, avg: 0.364", "ConsensusProposalToCommit: max: 0.801, avg: 0.772"]
Max round gap was 1 [limit 4] at version 1178492. Max no progress secs was 4.559624 [limit 15] at version 1178492.
Test Ok
- Grafana dashboard
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking
:white_check_mark: Forge suite framework_upgrade success on 01b24e7e3548382dd25440b39a0438a993387f12 ==> 12ed7ed454935a00902d8a31d795e779997f6ebb
Compatibility test results for 01b24e7e3548382dd25440b39a0438a993387f12 ==> 12ed7ed454935a00902d8a31d795e779997f6ebb (PR)
Upgrade the nodes to version: 12ed7ed454935a00902d8a31d795e779997f6ebb
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1297 txn/s, submitted: 1300 txn/s, failed submission: 2 txn/s, expired: 2 txn/s, latency: 2431 ms, (p50: 2100 ms, p90: 4200 ms, p99: 5700 ms), latency samples: 110280
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1183 txn/s, submitted: 1185 txn/s, failed submission: 2 txn/s, expired: 2 txn/s, latency: 2592 ms, (p50: 2100 ms, p90: 4200 ms, p99: 9300 ms), latency samples: 106540
5. check swarm health
Compatibility test for 01b24e7e3548382dd25440b39a0438a993387f12 ==> 12ed7ed454935a00902d8a31d795e779997f6ebb passed
Upgrade the remaining nodes to version: 12ed7ed454935a00902d8a31d795e779997f6ebb
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1170 txn/s, submitted: 1174 txn/s, failed submission: 3 txn/s, expired: 3 txn/s, latency: 2548 ms, (p50: 2100 ms, p90: 4500 ms, p99: 6000 ms), latency samples: 105360
Test Ok
- Grafana dashboard
- Humio Logs
- Axiom Logs
- Validator CPU Profile
- Fullnode CPU Profile
- Test runner output
- Test run is land-blocking