build
build copied to clipboard
Distributed benchmarking CI
Hey folks,
As most of you know, we don't run the benchmarking CI on every release because running all the suites takes days. That's mostly because we rely on a null-hypothesis algorithm to guarantee the confidence of our benchmark (I briefly wrote about it). While the optimization of this approach can be complex and not worth it, I wonder if we can distribute the benchmark in several machines (ideally each machine runs 2 namespaces), so we can start using it often on releases and measure regressions.
I know that we have a limited number of machines and IIRC benchmark machines are provided by Nearform. I'm opening this issue to discuss what are the challenges to improve it.
Maybe we can also identify a subset of the benchmark that can be finished in a reasonable amount of time (or with a config that make them finish in a reasonable amount of time), and then gradually growing that list...my impression is that some benchmarks can finish within a minute, while some of them can't finish even in a day, and there are some others who have way too many configurations but if you cut down the combinations a bit they can still finish in a reasonable amount of time.