tasty-bench averaging over several random inputs

Hi! Say I have an algorithm that I expect to perform differently on a bunch of different inputs and say I have a sufficiently good Gen instance ~ why would it be a bad idea to generate a bunch of inputs with Gen and then average the benchmarks over these to get a good picture on how the algorithm behaves on different inputs?

I feel like this would be kind of an obvious thing to do, not necessarily via Gen but maybe with an interface like this:

benchRandom 
  :: (NFData a, NFData b)
  => IO a 
  -- ^ obtain an input randomly and evaluate it to normal form
  -> (a -> b) 
  -- ^ a function to benchmark
  -> Benchmarkable
benchRandom = todo

and then maybe

benchArbitrary 
  :: (NFData a, Arbitrary a, NFData b) 
  => (a -> b) 
  -- ^ a function go generate inputs for 
  -> Benchmarkable 
benchArbitrary = benchRandom (generate arbitrary)

What do you think?

Sep 19 '24 10:09 MangoIV

Benchmarkable doesn't really allow for this though atm.

Sep 19 '24 11:09 MangoIV

funcToRandomBench
  :: forall a b c.
     (b -> c) -- ^ How to evaluate results, usually 'id' or 'rnf'.
  -> (a -> b) -- ^ What to benchmark
  -> IO a     -- ^ Generator of inputs
  -> Benchmarkable
funcToRandomBench frc = (Benchmarkable .) . funcToRandomBenchLoop SPEC
  where
    funcToRandomBenchLoop :: SPEC -> (a -> b) -> IO a -> Word64 -> IO ()
    funcToRandomBenchLoop !_ f mx n
      | n == 0    = pure ()
      | otherwise = do
        x <- mx
        _ <- evaluate (frc (f x))
        funcToRandomBenchLoop SPEC f mx (n - 1)
{-# INLINE funcToRandomBench #-}

Would this be enough? It's up to supplied IO a whether to generate random inputs each time or pregenerate a list of them and iterate over it repeatedly.

Sep 19 '24 19:09 Bodigrim

You would probably still benchmark a single random input multiple times to get good confidence and then average over multiple random inputs. I tried to use perRunEnv with criterion with this random function and it gave me wild results. It might be similar here (though it might also be just fine?) I think the function you propose is pretty good for random inputs that are expected to have very similar runtimes.

Sep 19 '24 19:09 MangoIV

You would probably still benchmark a single random input multiple times to get good confidence and then average over multiple random inputs.

If you limit IO a to iterate over a finite list of random inputs, this would happen automatically in funcToRandomBench, right?

Sep 19 '24 19:09 Bodigrim

I mean sure but I feel like this should ultimately be part of the interface, no?

Btw ~ really cool library, currently only using it to output stats in form of a csv but that’s much easier and quicker than with criterion ❤️

Sep 19 '24 20:09 MangoIV

I mean sure but I feel like this should ultimately be part of the interface, no?

Since funcToRandomBench can be implemented outside of tasty-bench, I'd like someone else to try it out and provide feedback whether it works good enough to consider its inclusion into the library.

Sep 19 '24 21:09 Bodigrim

Alright ~ I will try out to make something useful out if it :)

Sep 19 '24 21:09 MangoIV

@MangoIV any chance you had an opportunity to use funcToRandomBench in practice? I'm curious if it worked out.

Nov 02 '24 21:11 Bodigrim

I have ended up using something external. It just didn’t fit my purpose enough. That’s not a fault of the function or the implementation, I just needed all of the information so I made the random generation external…

I’ll tell you when I come around this again.

Nov 02 '24 21:11 MangoIV

I realised that we do not need any additional helpers, nfIO is good enough, because one can execute System.Random.randomIO inside. Closing.

Apr 13 '25 16:04 Bodigrim