Allow `summarize` to aggregate multiple benchmarks into one score
When measuring more than one benchmark, it would be nice to be able to aggregate the results into a single score. One common way to do this is to take the geometric mean of a set of results. This issue proposes adding a --aggregate-benchmarks flag to do exactly this. When enabled, sightglass-cli summarize --aggregate-benchmarks would emit a single, "aggregated by geomean" result for each of the phases using <all benchmarks>, e.g., for the benchmark column.
This was intentionally left out of the original RFC because it seemed like its main use would be to say "wasm engine X scores 95, and so it is better than wasm engine Y which scores 89" which was not the intended use of this benchmark suite.
It also treats all benchmark programs in the corpus as equals, when that isn't really true (I doubt we would accept a 2% regression on spidermonkey.wasm in order to get 3% speed ups in a few of the shoot out programs).
How are you intending on using this single number score?
Hi @fitzgen .. We should discuss offline (show the runner results) but this will be critical for reporting a summary of the performance impact of a patch. As far as I can see, currently we have a way to summarize the comparison of one benchmark at a time but not a way to summarize when running multiple benchmarks. From what I read from what @abrown describes is just an extension of comparing one benchmark at a time. Enabling a summary when running against benchmark-next//.wasm.