rust icon indicating copy to clipboard operation
rust copied to clipboard

Introduce `-Zmacro-stats`

Open nnethercote opened this issue 7 months ago • 11 comments

Introduce -Zmacro-stats.

It collects data about macro expansions and prints them in a table after expansion finishes. It's very useful for detecting macro bloat, especially for proc macros.

r? @petrochenkov

nnethercote avatar Jun 05 '25 09:06 nnethercote

I used this recently on a non-open-source codebase with ~100,000 lines of Rust code. cargo expand produced ~325,000 lines, due to heavy proc macro usage. Avoiding some unnecessary proc macro usage got this down to ~205,000 lines, reducing compile times by ~20%.

nnethercote avatar Jun 05 '25 09:06 nnethercote

-Zself-profile has the expand_proc_macro activity, didn't that work?

lqd avatar Jun 05 '25 09:06 lqd

This sounds awesome!

@lqd recently let me know about the fact that -Zself-profile can actually be used to record how many times was a specific proc macro invoked (https://github.com/rust-lang/rust/pull/95739), and I wanted to use that in a cargo command for profiling builds, but the thing that was missing was how many lines of code did the macro generate. With -Zmacro-stats that information would be available, which is awesome (although for programmatic usage, I'd probably want a JSON based output, but that can be added later if needed).

Kobzol avatar Jun 05 '25 09:06 Kobzol

-Zself-profile has the expand_proc_macro activity, didn't that work?

I hadn't heard about that...

nnethercote avatar Jun 05 '25 09:06 nnethercote

FWIW I tried to use it, and it gives me the durations of proc macro executions for individual macros (e.g. #[derive(Serialize, Deserialize)] would give me time for expanding Serialize and time for expanding Deserialize. However, it only seemed to work for actual proc macros, not things like #[derive(Hash)], and it doesn't contain information about how much code was generated. It also requires using -Zself-profile, which can make builds way slower than they normally are.

Kobzol avatar Jun 05 '25 09:06 Kobzol

builtin derives are not proc-macros, but you're more than welcome to add their expansion to the self-profiler :)

code-size is an interesting metric but it's not always correlated with compile times which you're interested in optimizing: you still want expansion times (example: cranelift-codegen has two huge functions of the same size, one of which takes 8ms in dataflow, the other 2.5s)

lqd avatar Jun 05 '25 09:06 lqd

The job mingw-check-2 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
    Checking rustc_expand v0.0.0 (/checkout/compiler/rustc_expand)
error: calling `push_str()` using a single-character string literal
   --> compiler/rustc_expand/src/expand.rs:192:13
    |
192 |             s.push_str("\n");
    |             ^^^^^^^^^^^^^^^^ help: consider using `push` with a character literal: `s.push('\n')`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#single_char_add_str
    = note: requested on the command line with `-D clippy::single-char-add-str`

[RUSTC-TIMING] rustc_expand test:false 2.551

rust-log-analyzer avatar Jun 05 '25 09:06 rust-log-analyzer

The job mingw-check-2 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

rust-log-analyzer avatar Jun 05 '25 10:06 rust-log-analyzer

builtin derives are not proc-macros, but you're more than welcome to add their expansion to the self-profiler :)

Right, that was my first question on this PR, if we can just add it to -Zself-profile without a new flag :) But I think that the richness of the potentially useful metadata that we could produce here might not be a good fit for the quite limited data model of the self-profile.

code-size is an interesting metric but it's not always correlated with compile times which you're interested in optimizing: you still want expansion times (example: cranelift-codegen has two huge functions of the same size, one of which takes 8ms in dataflow, the other 2.5s)

It's a very rough approximation, of course. Ideally, I would like to see "this macro generated code that took the compiler 2.5s to typecheck", but I don't suppose that's really feasible easily today, so I'll take what I can get :)

Kobzol avatar Jun 05 '25 10:06 Kobzol

AFAIK there's nothing inherent to the data model that would prevent adding information to correlate between events of different queries, if that's what was needed for nick's project or your query (e.g. if compilation time was not in expansion but elsewhere). But this is getting offtopic so I'll stop here.

lqd avatar Jun 05 '25 12:06 lqd

@rustbot ready

nnethercote avatar Jun 10 '25 02:06 nnethercote

I have addressed the comments. I moved the code to the stats module in a new commit. We can merge it with the previous commit before merging.

@rustbot ready

nnethercote avatar Jun 12 '25 08:06 nnethercote

r=me after the commit squash https://github.com/rust-lang/rust/pull/142069#issuecomment-2965687086 @rustbot author

petrochenkov avatar Jun 12 '25 09:06 petrochenkov

Reminder, once the PR becomes ready for a review, use @rustbot ready.

rustbot avatar Jun 12 '25 09:06 rustbot

I squashed.

@bors r=petrochenkov

nnethercote avatar Jun 12 '25 11:06 nnethercote

:pushpin: Commit 376cbc3787d2312b6b3b5db84dd1734fed1ebda6 has been approved by petrochenkov

It is now in the queue for this repository.

bors avatar Jun 12 '25 11:06 bors

@rust-timer build d601de62fe2839638632419cd4fe1562eee9732c

(For https://github.com/rust-lang/rust/pull/142442)

Kobzol avatar Jun 17 '25 13:06 Kobzol

Queued d601de62fe2839638632419cd4fe1562eee9732c with parent 015c7770ec0ffdba9ff03f1861144a827497f8ca, future comparison URL. There are currently 2 preceding artifacts in the queue. It will probably take at least ~4.1 hours until the benchmark run finishes.

rust-timer avatar Jun 17 '25 13:06 rust-timer

Finished benchmarking commit (d601de62fe2839638632419cd4fe1562eee9732c): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never @rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.1% [-1.2%, -1.0%] 4
Improvements ✅
(secondary)
-0.3% [-0.6%, -0.1%] 3
All ❌✅ (primary) -1.1% [-1.2%, -1.0%] 4

Max RSS (memory usage)

Results (primary 0.8%, secondary 0.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.8% [0.8%, 0.8%] 1
Regressions ❌
(secondary)
2.7% [2.7%, 2.7%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.2% [-2.2%, -2.2%] 1
All ❌✅ (primary) 0.8% [0.8%, 0.8%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 754.993s -> 692.243s (-8.31%) Artifact size: 372.27 MiB -> 372.32 MiB (0.02%)

rust-timer avatar Jun 17 '25 18:06 rust-timer