criterion icon indicating copy to clipboard operation
criterion copied to clipboard

Add a variant of runBenchmark which also takes iteration range.

Open phadej opened this issue 3 years ago • 2 comments

Currently there is:

runBenchmark :: Benchmarkable -> Double -> IO (Vector Measured, Double)

Run a single benchmark, and return measurements collected while executing it, along with the amount of time the measurement process took.

Double argument is: Lower bound on how long the benchmarking process should take. In practice, this time limit may be exceeded in order to generate enough data to perform meaningful statistical analyses.

I'd like to have a variant where we can specify the optional lower and upper bounds for iteration count (run at least 10 times, no need to run over 100 times).

Lower bound would be great to ensure more statistical significance. Upper bound would allow to avoid running individual benchmark for too long.

I think that then we'd need both lower and upper bound for time as well (current is actually more like soft upper bound, but as there are no end criteria it's also a lower bound).

I think it would best to put these four options into a record, to avoid double-blindness. In case when no bounds is specified, we could either fail or use some default time bound.

EDIT: summary:

  • the end criteria would be in N dimensions: you can stop only when all lower bounds are crossed and at least one upper bound.
  • no lower bound specified: zero
  • no upper bounds: set time to default, other unset ones are "infinity"/unreachable.

EDIT2; tasty-bench does analysis, and decides end criteria on the fly. That's another option and can be used to implement the above; but i think the above is still conseptually simpler.

i offer to write a patch.

phadej avatar Jan 12 '22 12:01 phadej

I agree that we should offer more control over how this works for the benefit of power users. See also #210/#218, where users have requested the ability to reduce the number of iterations of very long-lived benchmarks. My only request is that we should very carefully document the pitfalls of changing the defaults that criterion uses, as it has the potential to make the data more susceptible to statistical noise.

If you submit a PR, I can review it more carefully. From a conceptual level, however, I'd support this direction of travel.

RyanGlScott avatar Jan 12 '22 14:01 RyanGlScott

Yes, I wouldn't change the current defaults, i.e. the already existing functions behavior.

phadej avatar Jan 12 '22 15:01 phadej