A tool to check benchmark is within threshold up or down OR a tool for no change
See this comment for details on the idea for a tool within threshold up or down.
I was curious if it would be possible to add a tool for no change?
The use case is that I want to ensure that this benchmark is unchanged up or down and if there is ANY change at all in either direction I want an alert and fail.
Basically if the Ratio isn't 1 then alert/fail.
customNoChange? customEqualIsBetter?
This sounds like more of a test than a benchmark surely?
I mean... sure it is a "test", but isn't that the point of having the feature of "fail on warning"?
But fair ok, maybe "No Change" is unreasonable, i just thought it might be easier to implement considering how it is coded right now. Maybe something more useful from a benchmark perspective would be a tool that would warn if it exceeds a threshold, up or down. To me, it seems like it would be helpful if changes in either direction of a certain amount could indicate some kind of abnormality that should be brought up.
Use case: We are benchmarking the amount of chunks in our js bundle over time. Smaller is indeed better but if it drops from like 54 to 2 that probably indicates that something is wrong and should at least be addressed.
hey @kealjones-wk
I don't understand why you would need a new tool for that. Wouldn't it be sufficient to set alert-threshold to 100?
If I understand correctly, this issue actually points out a couple limitations in the current architecture: For a given tool, you lock in "higher is better" or "lower is better" for all tests. However, a single test could actually output several metrics, such as throughput and latency. (YCSB does this, to name one example.)
The solution here would be to extend BenchmarkResult with optional metric and direction:
{
name: "benchmarkname",
metric: "throughput",
direction: "higher_is_better",
unit: "trx/s",
....
If the above was supported, it would not be necessary to add a new tool just because you want a different direction for lower_is_better, or you want different alerting semantics.
Finally, yes, I too have found that alerting both on regressions and improvements is best practice. In the best case the improvement is real and was intentional. If so, it is good to verify that the improvement actually is there. More often than not, improvements actually are bugs and the reason the benchmark got faster is because it isn't doing anything.