`Instance Arbitrary Double` uses lots of memory with high sizes
My code has been doing scale (^ 8) to generate Doubles in between ±10^16 instead of in between ±100. But at that size, memory usage is very high since QuickCheck-2.14.3.
$ ghci -package QuickCheck-2.14.2 +RTS -M100M -RTS
GHCi, version 9.6.6: https://www.haskell.org/ghc/ :? for help
ghci> import Test.QuickCheck
ghci> flip traverse [1..100] $ \_ -> length . show <$> generate (scale (const 100_000_000) $ arbitrary @Double)
[20,20,19,19,19,19,20,20,21,20,20,19,20,17,20,19,21,19,20,21,18,19,20,20,20,18,20,19,20,19,19,18,19,21,20,19,18,19,20,21,20,19,20,19,19,19,18,20,19,19,21,19,19,17,20,19,16,21,20,19,18,19,20,20,19,19,21,20,20,21,20,20,19,19,19,20,20,19,19,19,19,20,19,19,17,20,20,20,20,20,18,19,20,19,19,19,20,20,20,19]
$ ghci -package QuickCheck-2.14.3 +RTS -M100M -RTS
GHCi, version 9.6.6: https://www.haskell.org/ghc/ :? for help
ghci> import Test.QuickCheck
ghci> flip traverse [1..100] $ \_ -> length . show <$> generate (scale (const 100_000_000) $ arbitrary @Double)
[20,*** Exception: heap overflow
(But I've also seen the heap overflow with -M4G.)
I guess this is due to smallDenominators introduced in #297, which picks a number up to size and then uses O(n) time and memory to index into a tree of rationals.
Some possibilities that come to mind:
- Specify that
Arbitraryinstances might not expect sizes that large, and users need to use other generators if they want that kind of thing. I don't think it would cause me big problems, but I don't love the idea. ("I want bigger numbers -> I'll use a bigger size" is pretty natural, which is how I came across this problem; and if the numbers are embedded in some other data structure, specifying another generator might be awkward.) - In
smallDenominators, limit the size to some relatively small max. - It's possible to calculate the tree values without enumerating the tree. ~~I guess this would make it O(log n) time and memory (effectively O(1) because size is
IntnotInteger).~~ I expect this would be a lot faster and more memory efficient for largen. If it's too slow for smalln, then "build the first _ tree values, calculate directly from then on" could work.
I can only say that size argument in QC is very much ad-hoc and particular Arbitrary instance specific. Without "breaking" (i.e. significantly changing) existing code, you just need to know how particular instance is behaving. This is especially true for compositional instances. Try to generate [[[[Int]]]]], it will blow up memory even at small sizes. In particular you cannot hint "generate small lists with big ints" nor "generate small lists of big ints"; the size is not "budgeted", it's copied.
That said, the
In smallDenominators, limit the size to some relatively small max.
feels the best solution. I.e. fix the particular instance in ad-hoc way. It's ad-hoc instance (size-wise) anyway.
Nod. I've opened #419 for this.