Factor in engine flags to the name of an engine for uniqueness
I was curious to test some flags against each other with Wasmtime to determine their effect on performance today, so I passed the same *.so as the -e flag to two collection runs with different --engine-flags values. When running effect-size, however, it failed with:
Error: Can only test significance between exactly two different engines. Found 1 different engines.
I then renamed the same *.so to two different files and the comparison came out ok, so this might be a case where factoring in the flags to the uniqueness calculating and/or printing later on would help.
I tried to implement this today and got lost in a maze of different structs with nearly identical fields (arch, engine, wasm, phase, event). So I'm not sure whether I'm up for following through, but wanted to at least note that I want this too.
My plan for the user experience was to allow multiple --engine-flags options to the benchmark subcommand, and try every flags string combined with every engine. Then if you want to compare two configurations of the same engine, specify --engine once and --engine-flags twice.
That part was pretty easy to implement. I just don't have the energy to push it through all the effect-size and summarize bits.