rustc-perf Add `helloworld-tiny` benchmark

Now that we finally have size artifact metrics in the DB, I think that we should also add some benchmarks that will specifically be focused on artifact sizes. I think that for starters it would be nice to measure the size of a small binary (either a helloworld or a completely empty one) and some reasonably common Rust binary (like ripgrep, since we already have it).

There are several approaches of adding these size-focused benchmarks:

Just create new benchmark that will use size-optimizing compile flags (LTO=fat, CGU=1, opt-level=z, strip=true, panic=abort) in its Cargo.toml. This is what this PR currently does. The good thing is that we don't have to build any new infrastructure, we just add new benchmarks. The worse thing is that the benchmark will be unnecessarily executed in modes that are mostly not interesting for the final binary size (doc, check, incremental builds). It might also not be immediately clear from the benchmark that we should care about its size:object_file metric. We might name them like <benchmark>-tiny or <benchmark>-size, but that might not be visible enough. What's maybe worse is that these benchmarks will use compilation flags that we don't normally benchmark (LTO, CGU, etc.). When some regression happens only for these flags, it might be difficult to realize from the benchmark description that it actually uses these flags, since they will use the normal scenarios and profiles.
Create a new profile or scenario for size-related benchmarks. This would make it more visible that the benchmark is special in some way, and we could use it to filter the size benchmarks (so e.g. helloworld-tiny would only be executed with profile=debug/opt and scenario=min-size).
Create a new category for size benchmarks. Then we could also treat them specially in the UI and e.g. display them in another section of the compare page. Arguably, this could also be done if these benchmarks used a specific scenario or a profile, so maybe another category is redundant.

Aug 18 '22 11:08 Kobzol

I'm in favor of this change. I think having a different scenario feels like the most appropriate given that we already have variable scenarios depending on which benchmark is being tested.

Aug 18 '22 16:08 rylev

How should it work in terms of selecting which benchmark should run? If we just add a hard-coded MinSize scenario and pass it in <command> --scenarios Full,IncrFull,...,MinSize, then it would be executed for all benchmarks by default, which is probably not what we want.

We could add something like additional-scenarios to the perf-config.json of the existing helloworld benchmark:

{
    "category": "primary",
    "additional-scenarios": [{
        "profile": "Opt",
        "env": {
          "CARGO_PROFILE_RELEASE_STRIP": "abort",
          ...
        }
    }]
}

And execute these scenarios for the selected profile unconditionally?

Or we can just hardcode selected benchmarks that will be executed with the MinSize scenario in collector/collect.sh:

target/release/collector bench_next $SITE_URL --self-profile --bench-rustc --db $DATABASE --scenarios Full,IncrFull,IncrPatched,IncrUnchanged
target/release/collector bench_next $SITE_URL --self-profile --bench-rustc --db $DATABASE --benchmarks helloworld,ripgrep --scenarios MinSize --profiles Opt

Aug 18 '22 17:08 Kobzol

Just create new benchmark that will use size-optimizing compile flags (LTO=fat, CGU=1, opt-level=z, strip=true, panic=abort) in its Cargo.toml. This is what this PR currently does.

I think this is the right thing to do for now. We can try it out for a while with one or two benchmarks, and see how it goes. That's a lot easier than building extra machinery (new profile/scenario/category) that (a) is only needed for a couple of benchmarks, and (b) might turn out to be a poor choice.

The good thing is that we don't have to build any new infrastructure, we just add new benchmarks.

:+1:

The worse thing is that the benchmark will be unnecessarily executed in modes that are mostly not interesting for the final binary size (doc, check, incremental builds). It might also not be immediately clear from the benchmark that we should care about its size:object_file metric. We might name them like <benchmark>-tiny or <benchmark>-size, but that might not be visible enough.

So are the only cases we're interested in debug full and opt full? For a start, get rid of the 0-println.patch file and that will avoid the incr-patched measurements :smiley:

We could also add some way to exclude measurements in the perf-config.json, e.g. exclude-profile and exclude-scenario options. (Or maybe include-profile/include-scenario instead, and the default is everything.)

What's maybe worse is that these benchmarks will use compilation flags that we don't normally benchmark (LTO, CGU, etc.). When some regression happens only for these flags, it might be difficult to realize from the benchmark description that it actually uses these flags, since they will use the normal scenarios and profiles.

I'm not too worried about that. rustc-perf contributors and triagers will learn quickly, and we're accustomed to normal compiler devs not understanding the perf numbers as well. And we have descriptions of the benchmarks in the documentation.

Aug 19 '22 00:08 nnethercote

Btw, I thought that the size:object_file metric contains the size of the final generated executable, but apparently it doesn't.

There are the sizes that I get when I compile a release version of helloworld and helloworld-tiny:

-rwxrwxr-x 2 kobzol kobzol 3843064 srp 19 15:55 target/release/helloworld
-rwxrwxr-x 2 kobzol kobzol 276880 srp 19 15:55 target/release/helloworld-tiny

But the size:object_file metric for Opt/Full build returns 8240 for helloworld and 2783352 for helloworld-tiny.

It seems that the proper metric name for that is linked_artifact. I suggest that we add this one to the compare page quick links.

Aug 19 '22 14:08 Kobzol

I think that ignoring scenarios is not meeded here, since there is no patch file? But probably IncrFull will still be executed, hmm, didn't think of that.

I'm not sure if binary size is that interesting for Debug (maybe for embedded?).

Sep 06 '22 04:09 Kobzol

Removing the patch file eliminates the IncrPatched scenario, but the IncrFull and IncrUnchanged ones will still run, as well as the non-incremental Full.

Sep 06 '22 05:09 nnethercote

Thanks, I missed that. I will send another PR with scenario exclusion.

Sep 06 '22 05:09 Kobzol

Now both uninteresting profiles and scenarios are filtered out.

Sep 06 '22 21:09 Kobzol

rustc-perf rustc-perf copied to clipboard

Add `helloworld-tiny` benchmark

rustc-perf
rustc-perf copied to clipboard