averaging over several random inputs
Hi! Say I have an algorithm that I expect to perform differently on a bunch of different inputs and say I have a sufficiently good Gen instance ~ why would it be a bad idea to generate a bunch of inputs with Gen and then average the benchmarks over these to get a good picture on how the algorithm behaves on different inputs?
I feel like this would be kind of an obvious thing to do, not necessarily via Gen but maybe with an interface like this:
benchRandom
:: (NFData a, NFData b)
=> IO a
-- ^ obtain an input randomly and evaluate it to normal form
-> (a -> b)
-- ^ a function to benchmark
-> Benchmarkable
benchRandom = todo
and then maybe
benchArbitrary
:: (NFData a, Arbitrary a, NFData b)
=> (a -> b)
-- ^ a function go generate inputs for
-> Benchmarkable
benchArbitrary = benchRandom (generate arbitrary)
What do you think?
Benchmarkable doesn't really allow for this though atm.
funcToRandomBench
:: forall a b c.
(b -> c) -- ^ How to evaluate results, usually 'id' or 'rnf'.
-> (a -> b) -- ^ What to benchmark
-> IO a -- ^ Generator of inputs
-> Benchmarkable
funcToRandomBench frc = (Benchmarkable .) . funcToRandomBenchLoop SPEC
where
funcToRandomBenchLoop :: SPEC -> (a -> b) -> IO a -> Word64 -> IO ()
funcToRandomBenchLoop !_ f mx n
| n == 0 = pure ()
| otherwise = do
x <- mx
_ <- evaluate (frc (f x))
funcToRandomBenchLoop SPEC f mx (n - 1)
{-# INLINE funcToRandomBench #-}
Would this be enough? It's up to supplied IO a whether to generate random inputs each time or pregenerate a list of them and iterate over it repeatedly.
You would probably still benchmark a single random input multiple times to get good confidence and then average over multiple random inputs. I tried to use perRunEnv with criterion with this random function and it gave me wild results. It might be similar here (though it might also be just fine?) I think the function you propose is pretty good for random inputs that are expected to have very similar runtimes.
You would probably still benchmark a single random input multiple times to get good confidence and then average over multiple random inputs.
If you limit IO a to iterate over a finite list of random inputs, this would happen automatically in funcToRandomBench, right?
I mean sure but I feel like this should ultimately be part of the interface, no?
Btw ~ really cool library, currently only using it to output stats in form of a csv but that’s much easier and quicker than with criterion ❤️
I mean sure but I feel like this should ultimately be part of the interface, no?
Since funcToRandomBench can be implemented outside of tasty-bench, I'd like someone else to try it out and provide feedback whether it works good enough to consider its inclusion into the library.
Alright ~ I will try out to make something useful out if it :)
@MangoIV any chance you had an opportunity to use funcToRandomBench in practice? I'm curious if it worked out.
I have ended up using something external. It just didn’t fit my purpose enough. That’s not a fault of the function or the implementation, I just needed all of the information so I made the random generation external…
I’ll tell you when I come around this again.
I realised that we do not need any additional helpers, nfIO is good enough, because one can execute System.Random.randomIO inside. Closing.