aptos-core icon indicating copy to clipboard operation
aptos-core copied to clipboard

[cargo] Enable LTO on release profile

Open perryjrandall opened this issue 1 year ago • 23 comments

The goal here is to make the default release profile the same as the performance profile so anyone using the release profile gets the optimally optimized binary.

We expect build time to regress a little bit as a tradeoff for consistency and performance, if the increase is too much we may have to abandon this and split the build pipeline but we'd really love to avoid doing so.

Test Plan:

The performance profile has been run in devnet and testnet before so it has been tested, this will need to go into v1.12 or another release to fully test in testnet / mainnet

perryjrandall avatar Apr 12 '24 17:04 perryjrandall

⏱️ 14h 46m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-smoke-tests 3h 10m 🟥🟥🟥🟥 (+1 more)
execution-performance / single-node-performance 2h 6m 🟩🟩🟥🟩🟩 (+1 more)
rust-targeted-unit-tests 2h 3m 🟩🟥🟩🟩 (+1 more)
rust-images / rust-all 1h 41m 🟩🟩🟥🟩🟩 (+1 more)
rust-move-tests 56m 🟩🟩🟥🟩🟩 (+1 more)
forge-e2e-test / forge 55m 🟩🟩🟩🟩
cli-e2e-tests / run-cli-tests 46m 🟩🟥🟩🟥
forge-compat-test / forge 46m 🟩🟩🟩🟩
rust-lints 27m 🟩🟩🟥🟩🟩 (+1 more)
run-tests-main-branch 26m 🟩🟩🟩🟩🟩 (+1 more)
check 21m 🟩🟩🟩🟩🟩 (+1 more)
rust-build-cached-packages 19m 🟩🟩🟥🟩🟩 (+1 more)
check-dynamic-deps 13m 🟩🟩🟩🟩🟩 (+2 more)
general-lints 10m 🟩🟩🟩🟩🟩 (+1 more)
semgrep/ci 10m 🟩🟩🟩🟩🟩 (+2 more)
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose 6m 🟩🟩🟩🟥
node-api-compatibility-tests / node-api-compatibility-tests 4m 🟩🟩🟩🟩
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+2 more)
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+2 more)
file_change_determinator 1m 🟩🟩🟩🟩🟩 (+1 more)
execution-performance / file_change_determinator 1m 🟩🟩🟩🟩🟩 (+1 more)
file_change_determinator 42s 🟩🟩🟩
permission-check 23s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 20s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 19s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 19s 🟩🟩🟩🟩🟩 (+2 more)
permission-check 17s 🟩🟩🟩🟩🟩 (+1 more)
determine-docker-build-metadata 16s 🟩🟩🟩🟩🟩 (+1 more)

🚨 4 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
semgrep/ci 7m 27s +1511%
rust-images / rust-all 33m 15m +117%
rust-move-tests 14m 11m +27%
forge-compat-test / forge 10m 13m -23%

settingsfeedbackdocs ⋅ learn more about trunk.io

trunk-io[bot] avatar Apr 12 '24 17:04 trunk-io[bot]

Forge is running suite compat on aptos-node-v1.10.1 ==> 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f

github-actions[bot] avatar Apr 12 '24 17:04 github-actions[bot]

Forge is running suite realistic_env_max_load on 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f

github-actions[bot] avatar Apr 12 '24 17:04 github-actions[bot]

:white_check_mark: Forge suite compat success on aptos-node-v1.10.1 ==> 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f

Compatibility test results for aptos-node-v1.10.1 ==> 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f (PR)
1. Check liveness of validators at old version: aptos-node-v1.10.1
compatibility::simple-validator-upgrade::liveness-check : committed: 6917 txn/s, latency: 4805 ms, (p50: 4800 ms, p90: 7600 ms, p99: 8400 ms), latency samples: 242100
2. Upgrading first Validator to new version: 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1776 txn/s, latency: 16188 ms, (p50: 19200 ms, p90: 22300 ms, p99: 22800 ms), latency samples: 92400
3. Upgrading rest of first batch to new version: 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1871 txn/s, latency: 15577 ms, (p50: 19500 ms, p90: 21900 ms, p99: 22200 ms), latency samples: 91680
4. upgrading second batch to new version: 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3469 txn/s, latency: 9305 ms, (p50: 9700 ms, p90: 12700 ms, p99: 13000 ms), latency samples: 145720
5. check swarm health
Compatibility test for aptos-node-v1.10.1 ==> 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f passed
Test Ok

github-actions[bot] avatar Apr 12 '24 18:04 github-actions[bot]

:white_check_mark: Forge suite realistic_env_max_load success on 9b5fc4ed35311bfb3161e1e6acb14a183e38f41f

two traffics test: inner traffic : committed: 8566 txn/s, latency: 4583 ms, (p50: 4500 ms, p90: 5400 ms, p99: 9600 ms), latency samples: 3692300
two traffics test : committed: 100 txn/s, latency: 1792 ms, (p50: 1800 ms, p90: 2000 ms, p99: 2400 ms), latency samples: 1860
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.208, avg: 0.204", "QsPosToProposal: max: 0.236, avg: 0.224", "ConsensusProposalToOrdered: max: 0.437, avg: 0.392", "ConsensusOrderedToCommit: max: 0.342, avg: 0.329", "ConsensusProposalToCommit: max: 0.731, avg: 0.721"]
Max round gap was 1 [limit 4] at version 1737754. Max no progress secs was 4.830913 [limit 15] at version 1737754.
Test Ok

github-actions[bot] avatar Apr 12 '24 18:04 github-actions[bot]

This is probably fine, and a good thing for build speed improvements, but I'm hesitant to actually release something "a little less optimized" because we're working so hard to optimize every part of the system to eek out a few last TPS.

brianolson avatar Apr 15 '24 15:04 brianolson

Forge is running suite compat on aptos-node-v1.10.1 ==> 974fdd6523d2b78d354a58adfe42389cb08c936c

github-actions[bot] avatar Apr 16 '24 21:04 github-actions[bot]

Forge is running suite realistic_env_max_load on 974fdd6523d2b78d354a58adfe42389cb08c936c

github-actions[bot] avatar Apr 16 '24 21:04 github-actions[bot]

:white_check_mark: Forge suite compat success on aptos-node-v1.10.1 ==> 974fdd6523d2b78d354a58adfe42389cb08c936c

Compatibility test results for aptos-node-v1.10.1 ==> 974fdd6523d2b78d354a58adfe42389cb08c936c (PR)
1. Check liveness of validators at old version: aptos-node-v1.10.1
compatibility::simple-validator-upgrade::liveness-check : committed: 6681 txn/s, latency: 4990 ms, (p50: 4800 ms, p90: 8700 ms, p99: 10100 ms), latency samples: 233860
2. Upgrading first Validator to new version: 974fdd6523d2b78d354a58adfe42389cb08c936c
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1758 txn/s, latency: 16016 ms, (p50: 19300 ms, p90: 22300 ms, p99: 22800 ms), latency samples: 91420
3. Upgrading rest of first batch to new version: 974fdd6523d2b78d354a58adfe42389cb08c936c
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1701 txn/s, latency: 16252 ms, (p50: 18700 ms, p90: 23500 ms, p99: 24600 ms), latency samples: 91900
4. upgrading second batch to new version: 974fdd6523d2b78d354a58adfe42389cb08c936c
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3544 txn/s, latency: 8887 ms, (p50: 9600 ms, p90: 12600 ms, p99: 12700 ms), latency samples: 145340
5. check swarm health
Compatibility test for aptos-node-v1.10.1 ==> 974fdd6523d2b78d354a58adfe42389cb08c936c passed
Test Ok

github-actions[bot] avatar Apr 16 '24 22:04 github-actions[bot]

:white_check_mark: Forge suite realistic_env_max_load success on 974fdd6523d2b78d354a58adfe42389cb08c936c

two traffics test: inner traffic : committed: 7610 txn/s, latency: 5145 ms, (p50: 4900 ms, p90: 6000 ms, p99: 10700 ms), latency samples: 3287780
two traffics test : committed: 100 txn/s, latency: 1837 ms, (p50: 1800 ms, p90: 2100 ms, p99: 2200 ms), latency samples: 1780
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.203, avg: 0.200", "QsPosToProposal: max: 0.362, avg: 0.250", "ConsensusProposalToOrdered: max: 0.452, avg: 0.426", "ConsensusOrderedToCommit: max: 0.381, avg: 0.363", "ConsensusProposalToCommit: max: 0.807, avg: 0.790"]
Max round gap was 1 [limit 4] at version 45949. Max no progress secs was 4.563667 [limit 15] at version 1562852.
Test Ok

github-actions[bot] avatar Apr 16 '24 22:04 github-actions[bot]

This is probably fine, and a good thing for build speed improvements, but I'm hesitant to actually release something "a little less optimized" because we're working so hard to optimize every part of the system to eek out a few last TPS.

Brian, this is releasing thin-LTO as the default, our current release profile does not have LTO. Which importantly means that whenever people build this from source (as many of our operators do) they will be using LTO by default which they are most likely not using currently.

So the idea here is to make the release profile the same as the performance profile and then deprecate the performance profile.

This should be a strictly better performing binary with an increase to build time, the test plan is basically saying that we'll incur some build time increase across the board to ship LTO in a more sustainable way.

I've updated the description to better reflect this

perryjrandall avatar Apr 16 '24 22:04 perryjrandall

Forge is running suite compat on aptos-node-v1.10.1 ==> f734772f843eab4becd2475eb9bebcd847556926

github-actions[bot] avatar Apr 16 '24 23:04 github-actions[bot]

Forge is running suite realistic_env_max_load on f734772f843eab4becd2475eb9bebcd847556926

github-actions[bot] avatar Apr 16 '24 23:04 github-actions[bot]

:white_check_mark: Forge suite compat success on aptos-node-v1.10.1 ==> f734772f843eab4becd2475eb9bebcd847556926

Compatibility test results for aptos-node-v1.10.1 ==> f734772f843eab4becd2475eb9bebcd847556926 (PR)
1. Check liveness of validators at old version: aptos-node-v1.10.1
compatibility::simple-validator-upgrade::liveness-check : committed: 6932 txn/s, latency: 4804 ms, (p50: 4800 ms, p90: 7800 ms, p99: 8400 ms), latency samples: 242620
2. Upgrading first Validator to new version: f734772f843eab4becd2475eb9bebcd847556926
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1769 txn/s, latency: 15943 ms, (p50: 19300 ms, p90: 22000 ms, p99: 22800 ms), latency samples: 92020
3. Upgrading rest of first batch to new version: f734772f843eab4becd2475eb9bebcd847556926
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1781 txn/s, latency: 16495 ms, (p50: 18400 ms, p90: 24300 ms, p99: 24600 ms), latency samples: 89080
4. upgrading second batch to new version: f734772f843eab4becd2475eb9bebcd847556926
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3608 txn/s, latency: 8799 ms, (p50: 9600 ms, p90: 12600 ms, p99: 12800 ms), latency samples: 144340
5. check swarm health
Compatibility test for aptos-node-v1.10.1 ==> f734772f843eab4becd2475eb9bebcd847556926 passed
Test Ok

github-actions[bot] avatar Apr 16 '24 23:04 github-actions[bot]

:white_check_mark: Forge suite realistic_env_max_load success on f734772f843eab4becd2475eb9bebcd847556926

two traffics test: inner traffic : committed: 8027 txn/s, latency: 4888 ms, (p50: 4800 ms, p90: 5700 ms, p99: 10000 ms), latency samples: 3459940
two traffics test : committed: 100 txn/s, latency: 1779 ms, (p50: 1700 ms, p90: 2000 ms, p99: 2500 ms), latency samples: 1820
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.202, avg: 0.200", "QsPosToProposal: max: 0.253, avg: 0.242", "ConsensusProposalToOrdered: max: 0.425, avg: 0.385", "ConsensusOrderedToCommit: max: 0.385, avg: 0.364", "ConsensusProposalToCommit: max: 0.766, avg: 0.749"]
Max round gap was 1 [limit 4] at version 1194533. Max no progress secs was 4.395426 [limit 15] at version 1194533.
Test Ok

github-actions[bot] avatar Apr 16 '24 23:04 github-actions[bot]

Forge is running suite compat on aptos-node-v1.10.1 ==> 5a679029eb48af6ee518b4b9de041359fabe093d

github-actions[bot] avatar Apr 17 '24 00:04 github-actions[bot]

Forge is running suite realistic_env_max_load on 5a679029eb48af6ee518b4b9de041359fabe093d

github-actions[bot] avatar Apr 17 '24 00:04 github-actions[bot]

:white_check_mark: Forge suite compat success on aptos-node-v1.10.1 ==> 5a679029eb48af6ee518b4b9de041359fabe093d

Compatibility test results for aptos-node-v1.10.1 ==> 5a679029eb48af6ee518b4b9de041359fabe093d (PR)
1. Check liveness of validators at old version: aptos-node-v1.10.1
compatibility::simple-validator-upgrade::liveness-check : committed: 6599 txn/s, latency: 4858 ms, (p50: 4800 ms, p90: 5400 ms, p99: 8400 ms), latency samples: 250780
2. Upgrading first Validator to new version: 5a679029eb48af6ee518b4b9de041359fabe093d
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1734 txn/s, latency: 16773 ms, (p50: 19100 ms, p90: 22200 ms, p99: 22500 ms), latency samples: 90180
3. Upgrading rest of first batch to new version: 5a679029eb48af6ee518b4b9de041359fabe093d
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1686 txn/s, latency: 16092 ms, (p50: 19000 ms, p90: 22200 ms, p99: 24700 ms), latency samples: 91060
4. upgrading second batch to new version: 5a679029eb48af6ee518b4b9de041359fabe093d
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 2978 txn/s, latency: 10725 ms, (p50: 11100 ms, p90: 13800 ms, p99: 15400 ms), latency samples: 116160
5. check swarm health
Compatibility test for aptos-node-v1.10.1 ==> 5a679029eb48af6ee518b4b9de041359fabe093d passed
Test Ok

github-actions[bot] avatar Apr 17 '24 00:04 github-actions[bot]

:white_check_mark: Forge suite realistic_env_max_load success on 5a679029eb48af6ee518b4b9de041359fabe093d

two traffics test: inner traffic : committed: 8132 txn/s, latency: 4811 ms, (p50: 4500 ms, p90: 5700 ms, p99: 13200 ms), latency samples: 3521320
two traffics test : committed: 100 txn/s, latency: 1905 ms, (p50: 1800 ms, p90: 2100 ms, p99: 6000 ms), latency samples: 1840
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.205, avg: 0.201", "QsPosToProposal: max: 0.333, avg: 0.240", "ConsensusProposalToOrdered: max: 0.463, avg: 0.416", "ConsensusOrderedToCommit: max: 0.386, avg: 0.368", "ConsensusProposalToCommit: max: 0.796, avg: 0.785"]
Max round gap was 2 [limit 4] at version 1774823. Max no progress secs was 4.883022 [limit 15] at version 1774719.
Test Ok

github-actions[bot] avatar Apr 17 '24 01:04 github-actions[bot]

Forge is running suite compat on aptos-node-v1.10.1 ==> ea882639933d8b9db3b80eac0729977ff50d47b4

github-actions[bot] avatar Apr 24 '24 17:04 github-actions[bot]

Forge is running suite realistic_env_max_load on ea882639933d8b9db3b80eac0729977ff50d47b4

github-actions[bot] avatar Apr 24 '24 17:04 github-actions[bot]

:white_check_mark: Forge suite compat success on aptos-node-v1.10.1 ==> ea882639933d8b9db3b80eac0729977ff50d47b4

Compatibility test results for aptos-node-v1.10.1 ==> ea882639933d8b9db3b80eac0729977ff50d47b4 (PR)
1. Check liveness of validators at old version: aptos-node-v1.10.1
compatibility::simple-validator-upgrade::liveness-check : committed: 4175 txn/s, latency: 7444 ms, (p50: 7900 ms, p90: 10200 ms, p99: 14100 ms), latency samples: 175380
2. Upgrading first Validator to new version: ea882639933d8b9db3b80eac0729977ff50d47b4
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1691 txn/s, latency: 16101 ms, (p50: 19300 ms, p90: 23700 ms, p99: 25000 ms), latency samples: 93040
3. Upgrading rest of first batch to new version: ea882639933d8b9db3b80eac0729977ff50d47b4
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1763 txn/s, latency: 16477 ms, (p50: 18900 ms, p90: 22300 ms, p99: 22600 ms), latency samples: 91720
4. upgrading second batch to new version: ea882639933d8b9db3b80eac0729977ff50d47b4
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3613 txn/s, latency: 8859 ms, (p50: 9600 ms, p90: 12500 ms, p99: 12700 ms), latency samples: 144540
5. check swarm health
Compatibility test for aptos-node-v1.10.1 ==> ea882639933d8b9db3b80eac0729977ff50d47b4 passed
Test Ok

github-actions[bot] avatar Apr 24 '24 18:04 github-actions[bot]

:white_check_mark: Forge suite realistic_env_max_load success on ea882639933d8b9db3b80eac0729977ff50d47b4

two traffics test: inner traffic : committed: 8120 txn/s, latency: 4840 ms, (p50: 4500 ms, p90: 5700 ms, p99: 13200 ms), latency samples: 3499780
two traffics test : committed: 100 txn/s, latency: 1858 ms, (p50: 1800 ms, p90: 2100 ms, p99: 2300 ms), latency samples: 1840
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.207, avg: 0.205", "QsPosToProposal: max: 0.270, avg: 0.212", "ConsensusProposalToOrdered: max: 0.442, avg: 0.418", "ConsensusOrderedToCommit: max: 0.398, avg: 0.383", "ConsensusProposalToCommit: max: 0.824, avg: 0.801"]
Max round gap was 1 [limit 4] at version 1200561. Max no progress secs was 4.945444 [limit 15] at version 1200561.
Test Ok

github-actions[bot] avatar Apr 24 '24 18:04 github-actions[bot]

This issue is stale because it has been open 45 days with no activity. Remove the stale label, comment or push a commit - otherwise this will be closed in 15 days.

github-actions[bot] avatar Jun 10 '24 01:06 github-actions[bot]