Use past average latency for comparison on pull request benchmarks
Right now we are using the last landed commit's latency for comparison when performing benchmarks on pull requests. With just one single data point, it causes quite some fluctuation and wrong flagging of improvements/regressions. We should expose an API in dana to query the average latency for a benchmark series and use that for comparison on pull requests. (dana already have such information calculated; just need to do the plumbing to expose it.)
I actually filed the same issue #13377 before. This one seems to be more complete, closed the previous one.
Another idea is to use checksums to filter out noise and potentially aggregate the latencies from the same artifacts to get more accurate results (post the idea from #12152)
Unassigned myself as I'm not working on this currently
obsolete