rust icon indicating copy to clipboard operation
rust copied to clipboard

[experiment] Use jemalloc for Clippy

Open Kobzol opened this issue 5 months ago • 5 comments
trafficstars

The tool macros are annoying, we should IMO just get rid of them, create separate steps for each tool and (re)use some builders in them to share the build code.

r? @ghost

Kobzol avatar Jun 10 '25 09:06 Kobzol

Some changes occurred in src/tools/opt-dist

cc @kobzol

Some changes occurred in src/tools/clippy

cc @rust-lang/clippy

rustbot avatar Jun 10 '25 09:06 rustbot

@bors2 try

(Creating artifacts for local benchmarks)

Kobzol avatar Jun 10 '25 09:06 Kobzol

:hourglass: Trying commit 0c6bbfbd652a0aeacd054bc42286c2c6f95eb585 with merge df45868f04767f218ec5cc3d611eab803eaf32ec…

To cancel the try build, run the command @bors2 try cancel.

rust-bors[bot] avatar Jun 10 '25 09:06 rust-bors[bot]

:sunny: Try build successful (CI) Build commit: df45868f04767f218ec5cc3d611eab803eaf32ec (df45868f04767f218ec5cc3d611eab803eaf32ec)

rust-bors[bot] avatar Jun 10 '25 11:06 rust-bors[bot]

Confirmed that previously, Clippy used glibc malloc, but with this PR, it uses jemalloc, using rustc-perf:

cargo run --bin collector profile_local cachegrind `rustup +df45868f04767f218ec5cc3d611eab803eaf32ec which rustc` --profiles Clippy --scenarios Full --exact-match helloworld

In terms of performance, it looks pretty good! image

Kobzol avatar Jun 10 '25 12:06 Kobzol

r? @flip1995

Kobzol avatar Jun 12 '25 05:06 Kobzol

I'm finding some differences in my results (although I'm checking them directly with the cachegrind annotations, not through the UI). I'll perform some more benchmarks but I have some questions. In my outputs cargo is 8.6% better, but for example syn is considerably worse (12.16b -> 13.01b), and serde is too (24b -> 28b).

Did you use RUSTFLAGS="-Wclippy:all"? With the command you provided it seems that the benchmarks fail because of some deny-level lints? Also, did you use multithreading with -Zthreads=XX?

I'll make sure that this is not a result from some background noise today, just sharing my current results. Let's hope that I can figure it out and make sure that these results are just background noise.

NOTE: Seems that although the benchmark fails, cgann- files are still created. So it doesn't really matter if the benchmark fails in this case (?)

blyxyas avatar Jun 17 '25 13:06 blyxyas

I just used the default --profiles=Clippy in rustc-perf, which was recently fixed to actually use Clippy instead of doing a debug build :laughing: No multi-threading was used, and I only ran the benchmarks presented above. I haven't actually tried Clippy on the whole benchmark suite, if it fails on some benchmarks, then we should definitely pass RUSTFLAGS="-Wclippy:all"! Would you like to send a PR to rustc-perf that adds this flag?

Kobzol avatar Jun 17 '25 14:06 Kobzol

Finally got to benchmark it on a separate server, and in this server with -Wclippy::all (I'll open a PR to change this in rustc-perf) I've found improvements between 3.6% to 4.3% in manual revision. (Notably, single-threaded, but I don't think that multithreading will affect this significantly)

RUSTFLAGS="-Wclippy::all" cargo run -r --bin collector profile_local cachegrind `rustup +clippy-jemalloc which rustc` --profiles Clippy --scenarios Full --id clippy_jemalloc

blyxyas avatar Jun 18 '25 14:06 blyxyas

Thank you for the benchmark!

@bors r=flip1995,blyxyas rollup=never

Kobzol avatar Jun 18 '25 14:06 Kobzol

:pushpin: Commit 723dae84c18f512bfc4bbc65c51465d2dea14c03 has been approved by flip1995,blyxyas

It is now in the queue for this repository.

bors avatar Jun 18 '25 14:06 bors

So nostalgic using bors again

blyxyas avatar Jun 18 '25 14:06 blyxyas

:hourglass: Testing commit 723dae84c18f512bfc4bbc65c51465d2dea14c03 with merge c4c766b7def6e58221fd88c94a15047de27bb2ea...

bors avatar Jun 20 '25 00:06 bors

The job dist-x86_64-msvc failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[2025-06-20T01:21:42Z DEBUG collector::compile::benchmark] Benchmark iteration 1/1
[2025-06-20T01:21:42Z INFO  collector::compile::execute] run_rustc with incremental=false, profile=Opt, scenario=Some(Full), patch=None, backend=Llvm, target=X86_64UnknownLinuxGnu, phase=benchmark
[2025-06-20T01:21:42Z DEBUG collector::compile::execute] "\\\\?\\C:\\a\\rust\\rust\\build\\x86_64-pc-windows-msvc\\stage0\\bin\\cargo.exe" "rustc" "--manifest-path" "Cargo.toml" "-p" "path+file:///C:/Users/RUNNER~1/AppData/Local/Temp/.tmpkOmtjQ#[email protected]" "--release" "--" "--wrap-rustc-with" "Eprintln"
[2025-06-20T01:21:43Z INFO  collector::compile::execute] run_rustc with incremental=true, profile=Opt, scenario=Some(IncrFull), patch=None, backend=Llvm, target=X86_64UnknownLinuxGnu, phase=benchmark
[2025-06-20T01:21:43Z DEBUG collector::compile::execute] "\\\\?\\C:\\a\\rust\\rust\\build\\x86_64-pc-windows-msvc\\stage0\\bin\\cargo.exe" "rustc" "--manifest-path" "Cargo.toml" "-p" "path+file:///C:/Users/RUNNER~1/AppData/Local/Temp/.tmpkOmtjQ#[email protected]" "--release" "--" "--wrap-rustc-with" "Eprintln" "-C" "incremental=C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\.tmpkOmtjQ\\incremental-state"
[2025-06-20T01:21:45Z INFO  collector::compile::execute] run_rustc with incremental=true, profile=Opt, scenario=Some(IncrUnchanged), patch=None, backend=Llvm, target=X86_64UnknownLinuxGnu, phase=benchmark
[2025-06-20T01:21:45Z DEBUG collector::compile::execute] "\\\\?\\C:\\a\\rust\\rust\\build\\x86_64-pc-windows-msvc\\stage0\\bin\\cargo.exe" "rustc" "--manifest-path" "Cargo.toml" "-p" "path+file:///C:/Users/RUNNER~1/AppData/Local/Temp/.tmpkOmtjQ#[email protected]" "--release" "--" "--wrap-rustc-with" "Eprintln" "-C" "incremental=C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\.tmpkOmtjQ\\incremental-state"
Finished benchmark match-stress (6/9)
Executing benchmark serde-1.0.219-new-solver (7/9)
Preparing serde-1.0.219-new-solver
[2025-06-20T01:21:46Z INFO  collector::compile::execute] run_rustc with incremental=false, profile=Check, scenario=None, patch=None, backend=Llvm, target=X86_64UnknownLinuxGnu, phase=dependencies
[2025-06-20T01:21:46Z INFO  collector::compile::execute] run_rustc with incremental=false, profile=Debug, scenario=None, patch=None, backend=Llvm, target=X86_64UnknownLinuxGnu, phase=dependencies

rust-log-analyzer avatar Jun 20 '25 02:06 rust-log-analyzer

:broken_heart: Test failed - checks-actions

bors avatar Jun 20 '25 02:06 bors

Looks like some spurious Windows file path error..

@bors retry

Kobzol avatar Jun 20 '25 05:06 Kobzol

:hourglass: Testing commit 723dae84c18f512bfc4bbc65c51465d2dea14c03 with merge 18491d5be00eb3ed2f1ccee2ac5b792694f2a7a0...

bors avatar Jun 20 '25 06:06 bors

:sunny: Test successful - checks-actions Approved by: flip1995,blyxyas Pushing 18491d5be00eb3ed2f1ccee2ac5b792694f2a7a0 to master...

bors avatar Jun 20 '25 09:06 bors

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 5b74275f89b6041bf2e9dc2abcf332e206d4cfca (parent) -> 18491d5be00eb3ed2f1ccee2ac5b792694f2a7a0 (this PR)

Test differences

No test diffs found

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 18491d5be00eb3ed2f1ccee2ac5b792694f2a7a0 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. i686-msvc-2: 7297.0s -> 11171.8s (53.1%)
  2. dist-i686-msvc: 6854.1s -> 10368.1s (51.3%)
  3. dist-x86_64-apple: 7868.7s -> 11661.7s (48.2%)
  4. x86_64-apple-1: 6732.2s -> 8569.5s (27.3%)
  5. x86_64-apple-2: 4546.3s -> 5596.5s (23.1%)
  6. dist-aarch64-apple: 6379.9s -> 5483.3s (-14.1%)
  7. aarch64-apple: 4932.8s -> 4363.8s (-11.5%)
  8. x86_64-rust-for-linux: 2819.3s -> 2543.9s (-9.8%)
  9. dist-apple-various: 6970.2s -> 6401.2s (-8.2%)
  10. dist-riscv64-linux: 4962.6s -> 4605.6s (-7.2%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance that executed the job, system noise, invalidated caches, etc. The table above is provided mostly for t-infra members, for simpler debugging of potential CI slow-downs.

github-actions[bot] avatar Jun 20 '25 10:06 github-actions[bot]

Finished benchmarking commit (18491d5be00eb3ed2f1ccee2ac5b792694f2a7a0): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 1.6%, secondary 2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.6% [1.1%, 2.6%] 4
Regressions ❌
(secondary)
2.7% [1.2%, 7.1%] 17
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.3% [-5.3%, -5.3%] 1
All ❌✅ (primary) 1.6% [1.1%, 2.6%] 4

Cycles

Results (secondary -3.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.8% [3.8%, 3.8%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.6% [-9.0%, -2.3%] 5
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 692.705s -> 691.482s (-0.18%) Artifact size: 372.00 MiB -> 371.94 MiB (-0.01%)

rust-timer avatar Jun 20 '25 12:06 rust-timer