BenchmarkTools.jl icon indicating copy to clipboard operation
BenchmarkTools.jl copied to clipboard

Add seed parameter

Open gustaphe opened this issue 2 years ago • 14 comments

Partial solution for #201 . Since it needs to be serializable, I opted for just a seed number. seed! is not super well documented, so Int may not be the perfect type, but it's pretty handy. Negative seeds appear to be invalid, so I could use that as a stand-in for nothing, as Union{Int, Nothing} ruined serialization.

  • [x] PR
  • [x] Complete tests
  • [x] Documentation

gustaphe avatar Sep 21 '23 15:09 gustaphe

  • [x] Don't rely on 1.7 kwarg syntax

(I thought about this and forgot to fix it)

gustaphe avatar Sep 21 '23 16:09 gustaphe

There is a problem in that this only works if the benchmarks take the same number of samples. I don't know if this should be enforced or marked as a caveat: "The same seed guarantees the same sequence of random numbers, but the benchmarks may still pick an unequal number of random numbers".

In practice, consistency increases even if it's not guaranteed to be perfect.

gustaphe avatar Sep 23 '23 11:09 gustaphe

Probably a dumb question, but why not give a seed to each benchmarkable instead?

gdalle avatar Sep 23 '23 13:09 gdalle

I guess one could. But the seed needs to be set before an entire benchmark run, not for each sample. And if it's not for a group you can just run seed! before running the benchmark.

An alternative is to add a hook to run, like run(group; setup=(seed!(1234))) - I don't have a strong opinion.

gustaphe avatar Sep 23 '23 15:09 gustaphe

Given this recent Discourse thread, I think a seed for each benchmarkable would also make sense:

https://discourse.julialang.org/t/making-benchmark-outputs-statistically-meaningful-and-actionable/104256/

What do you think @gustaphe ?

gdalle avatar Sep 26 '23 19:09 gdalle

I didn't think that would work, but it took me like a second to convince myself it would. If nobody's made a PR like that by Saturday I will

gustaphe avatar Sep 26 '23 19:09 gustaphe

Wait, isn't it already possible to set the seed in the setup code of each benchmarkable?

gdalle avatar Sep 29 '23 18:09 gdalle

Wait, isn't it already possible to set the seed in the setup code of each benchmarkable?

No. If you set the seed in the setup, it gets reset for every sample, not just at the start of the benchmark run. Consider

julia> b = @benchmarkable sleep(rand([0, 0.5])) setup=(Random.seed!(1234))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> r = run(b)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  1.531 μs … 48.249 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.739 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.836 μs ±  1.147 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▃▆██▅▄▃▁
  ▂▂▂▃▃▅▇█████████▇▆▅▄▄▃▃▃▃▃▃▃▃▂▃▂▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▂▁▂▁▁▁▂ ▃
  1.53 μs        Histogram: frequency by time        2.57 μs <

 Memory estimate: 192 bytes, allocs estimate: 5.

vs

julia> b = @benchmarkable sleep(rand([0, 0.5])) setup=(Random.seed!(1235))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> r = run(b)
BenchmarkTools.Trial: 10 samples with 1 evaluation.
 Range (min … max):  501.700 ms … 502.117 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     501.726 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   501.802 ms ± 156.542 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █           ▃
  █▇▇▁▇▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▇ ▁
  502 ms           Histogram: frequency by time          502 ms <

 Memory estimate: 192 bytes, allocs estimate: 5.

each of these runs only ever use a single result from rand. What you want is a series of random numbers, but in a repeatable sequence.

gustaphe avatar Oct 01 '23 19:10 gustaphe

I would guess that the most useful use of this parameter is supplying it as run(::BenchmarkGroup; seed=something), but this really looks like the best implementation.

gustaphe avatar Oct 01 '23 20:10 gustaphe

I'm not clear on the details here, but instead of asking for a seed, why not copy the state of the global RNG at the beginning rng_copy = copy(Random.default_rng()), and then before each benchmark run, restore this RNG state (copy!(Random.default_rng(), rng_copy). This is way faster than seeding the RNG, and could even be done by default. If the use then wants to try a specific seed, then seed!(seed) can be done manually at the beginning.

EDIT: this is what @testset does BTW.

rfourquet avatar Oct 10 '23 13:10 rfourquet

I'd go one step further and let the user provide the entire RNG object they want to use, defaulting to a state-resetting copy of the default RNG.

Seelengrab avatar Oct 10 '23 14:10 Seelengrab

@gustaphe what do you think of this proposal?

gdalle avatar Oct 30 '23 17:10 gdalle

I've been waiting for a good time to read up on this/try it out but haven't found it. The obstacles I see are:

  • Serialization - as is, benchmark objects are serialized, by some method I don't quite understand, so using an integer worked out-of-the-box but the other methods (possibly even the suggestion of using nothing as a sentinel) don't.
  • Is there a good way to supply an RNG object and have it reset every time? Because it's no use if this is just an RNG that is set once and then evolves throughout the experiment.

Like I said, I don't know if these are reasonable objections, I just haven't had time to think carefully about them.

gustaphe avatar Oct 31 '23 20:10 gustaphe

Serializing an e.g. Xoshiro should be perfectly possible, it just needs to serialize the internal state & then reconstruct the RNG object from that state. I'm kind of surprised this isn't already implemented in Base. Resetting it every time means just copying the existing RNG object before the benchmark, and using the copy for the benchmark. I'd definitely make that toggleable though.

Seelengrab avatar Nov 01 '23 14:11 Seelengrab