rustc-perf
rustc-perf copied to clipboard
Add `helloworld-tiny` benchmark
Now that we finally have size artifact metrics in the DB, I think that we should also add some benchmarks that will specifically be focused on artifact sizes. I think that for starters it would be nice to measure the size of a small binary (either a helloworld
or a completely empty one) and some reasonably common Rust binary (like ripgrep
, since we already have it).
There are several approaches of adding these size-focused benchmarks:
- Just create new benchmark that will use size-optimizing compile flags (LTO=fat, CGU=1, opt-level=z, strip=true, panic=abort) in its
Cargo.toml
. This is what this PR currently does. The good thing is that we don't have to build any new infrastructure, we just add new benchmarks. The worse thing is that the benchmark will be unnecessarily executed in modes that are mostly not interesting for the final binary size (doc, check, incremental builds). It might also not be immediately clear from the benchmark that we should care about itssize:object_file
metric. We might name them like<benchmark>-tiny
or<benchmark>-size
, but that might not be visible enough. What's maybe worse is that these benchmarks will use compilation flags that we don't normally benchmark (LTO, CGU, etc.). When some regression happens only for these flags, it might be difficult to realize from the benchmark description that it actually uses these flags, since they will use the normal scenarios and profiles. - Create a new profile or scenario for size-related benchmarks. This would make it more visible that the benchmark is special in some way, and we could use it to filter the size benchmarks (so e.g.
helloworld-tiny
would only be executed withprofile=debug/opt
andscenario=min-size
). - Create a new category for
size
benchmarks. Then we could also treat them specially in the UI and e.g. display them in another section of thecompare
page. Arguably, this could also be done if these benchmarks used a specificscenario
or aprofile
, so maybe another category is redundant.
I'm in favor of this change. I think having a different scenario feels like the most appropriate given that we already have variable scenarios depending on which benchmark is being tested.
How should it work in terms of selecting which benchmark should run? If we just add a hard-coded MinSize
scenario and pass it in <command> --scenarios Full,IncrFull,...,MinSize
, then it would be executed for all benchmarks by default, which is probably not what we want.
We could add something like additional-scenarios
to the perf-config.json
of the existing helloworld
benchmark:
{
"category": "primary",
"additional-scenarios": [{
"profile": "Opt",
"env": {
"CARGO_PROFILE_RELEASE_STRIP": "abort",
...
}
}]
}
And execute these scenarios for the selected profile unconditionally?
Or we can just hardcode selected benchmarks that will be executed with the MinSize
scenario in collector/collect.sh
:
target/release/collector bench_next $SITE_URL --self-profile --bench-rustc --db $DATABASE --scenarios Full,IncrFull,IncrPatched,IncrUnchanged
target/release/collector bench_next $SITE_URL --self-profile --bench-rustc --db $DATABASE --benchmarks helloworld,ripgrep --scenarios MinSize --profiles Opt
- Just create new benchmark that will use size-optimizing compile flags (LTO=fat, CGU=1, opt-level=z, strip=true, panic=abort) in its
Cargo.toml
. This is what this PR currently does.
I think this is the right thing to do for now. We can try it out for a while with one or two benchmarks, and see how it goes. That's a lot easier than building extra machinery (new profile/scenario/category) that (a) is only needed for a couple of benchmarks, and (b) might turn out to be a poor choice.
The good thing is that we don't have to build any new infrastructure, we just add new benchmarks.
:+1:
The worse thing is that the benchmark will be unnecessarily executed in modes that are mostly not interesting for the final binary size (doc, check, incremental builds). It might also not be immediately clear from the benchmark that we should care about its
size:object_file
metric. We might name them like<benchmark>-tiny
or<benchmark>-size
, but that might not be visible enough.
So are the only cases we're interested in debug full
and opt full
? For a start, get rid of the 0-println.patch
file and that will avoid the incr-patched
measurements :smiley:
We could also add some way to exclude measurements in the perf-config.json
, e.g. exclude-profile
and exclude-scenario
options. (Or maybe include-profile
/include-scenario
instead, and the default is everything.)
What's maybe worse is that these benchmarks will use compilation flags that we don't normally benchmark (LTO, CGU, etc.). When some regression happens only for these flags, it might be difficult to realize from the benchmark description that it actually uses these flags, since they will use the normal scenarios and profiles.
I'm not too worried about that. rustc-perf contributors and triagers will learn quickly, and we're accustomed to normal compiler devs not understanding the perf numbers as well. And we have descriptions of the benchmarks in the documentation.
Btw, I thought that the size:object_file
metric contains the size of the final generated executable, but apparently it doesn't.
There are the sizes that I get when I compile a release version of helloworld
and helloworld-tiny
:
-rwxrwxr-x 2 kobzol kobzol 3843064 srp 19 15:55 target/release/helloworld
-rwxrwxr-x 2 kobzol kobzol 276880 srp 19 15:55 target/release/helloworld-tiny
But the size:object_file
metric for Opt/Full
build returns 8240
for helloworld
and 2783352
for helloworld-tiny
.
It seems that the proper metric name for that is linked_artifact
. I suggest that we add this one to the compare page quick links.
I think that ignoring scenarios is not meeded here, since there is no patch file? But probably IncrFull will still be executed, hmm, didn't think of that.
I'm not sure if binary size is that interesting for Debug (maybe for embedded?).
Removing the patch file eliminates the IncrPatched
scenario, but the IncrFull
and IncrUnchanged
ones will still run, as well as the non-incremental Full
.
Thanks, I missed that. I will send another PR with scenario exclusion.
Now both uninteresting profiles and scenarios are filtered out.