github-action-benchmark icon indicating copy to clipboard operation
github-action-benchmark copied to clipboard

A tool to check benchmark is within threshold up or down OR a tool for no change

Open kealjones-wk opened this issue 10 months ago • 4 comments

See this comment for details on the idea for a tool within threshold up or down.

I was curious if it would be possible to add a tool for no change?

The use case is that I want to ensure that this benchmark is unchanged up or down and if there is ANY change at all in either direction I want an alert and fail.

Basically if the Ratio isn't 1 then alert/fail.

customNoChange? customEqualIsBetter?

kealjones-wk avatar Feb 04 '25 16:02 kealjones-wk

This sounds like more of a test than a benchmark surely?

TomAFrench avatar Feb 05 '25 00:02 TomAFrench

I mean... sure it is a "test", but isn't that the point of having the feature of "fail on warning"?

But fair ok, maybe "No Change" is unreasonable, i just thought it might be easier to implement considering how it is coded right now. Maybe something more useful from a benchmark perspective would be a tool that would warn if it exceeds a threshold, up or down. To me, it seems like it would be helpful if changes in either direction of a certain amount could indicate some kind of abnormality that should be brought up.

Use case: We are benchmarking the amount of chunks in our js bundle over time. Smaller is indeed better but if it drops from like 54 to 2 that probably indicates that something is wrong and should at least be addressed.

kealjones-wk avatar Feb 13 '25 17:02 kealjones-wk

hey @kealjones-wk

I don't understand why you would need a new tool for that. Wouldn't it be sufficient to set alert-threshold to 100?

ktrz avatar Mar 29 '25 08:03 ktrz

If I understand correctly, this issue actually points out a couple limitations in the current architecture: For a given tool, you lock in "higher is better" or "lower is better" for all tests. However, a single test could actually output several metrics, such as throughput and latency. (YCSB does this, to name one example.)

The solution here would be to extend BenchmarkResult with optional metric and direction:

{
name: "benchmarkname",
metric: "throughput",
direction: "higher_is_better",
unit: "trx/s",
....

If the above was supported, it would not be necessary to add a new tool just because you want a different direction for lower_is_better, or you want different alerting semantics.

Finally, yes, I too have found that alerting both on regressions and improvements is best practice. In the best case the improvement is real and was intentional. If so, it is good to verify that the improvement actually is there. More often than not, improvements actually are bugs and the reason the benchmark got faster is because it isn't doing anything.

henrikingo avatar Jul 22 '25 00:07 henrikingo