quickcheck icon indicating copy to clipboard operation
quickcheck copied to clipboard

`Instance Arbitrary Double` uses lots of memory with high sizes

Open ChickenProp opened this issue 1 year ago • 2 comments

My code has been doing scale (^ 8) to generate Doubles in between ±10^16 instead of in between ±100. But at that size, memory usage is very high since QuickCheck-2.14.3.

$ ghci -package QuickCheck-2.14.2 +RTS -M100M -RTS
GHCi, version 9.6.6: https://www.haskell.org/ghc/  :? for help
ghci> import Test.QuickCheck
ghci> flip traverse [1..100] $ \_ -> length . show <$> generate (scale (const 100_000_000) $ arbitrary @Double)
[20,20,19,19,19,19,20,20,21,20,20,19,20,17,20,19,21,19,20,21,18,19,20,20,20,18,20,19,20,19,19,18,19,21,20,19,18,19,20,21,20,19,20,19,19,19,18,20,19,19,21,19,19,17,20,19,16,21,20,19,18,19,20,20,19,19,21,20,20,21,20,20,19,19,19,20,20,19,19,19,19,20,19,19,17,20,20,20,20,20,18,19,20,19,19,19,20,20,20,19]

$ ghci -package QuickCheck-2.14.3 +RTS -M100M -RTS
GHCi, version 9.6.6: https://www.haskell.org/ghc/  :? for help
ghci> import Test.QuickCheck
ghci> flip traverse [1..100] $ \_ -> length . show <$> generate (scale (const 100_000_000) $ arbitrary @Double)
[20,*** Exception: heap overflow

(But I've also seen the heap overflow with -M4G.)

I guess this is due to smallDenominators introduced in #297, which picks a number up to size and then uses O(n) time and memory to index into a tree of rationals.

Some possibilities that come to mind:

  • Specify that Arbitrary instances might not expect sizes that large, and users need to use other generators if they want that kind of thing. I don't think it would cause me big problems, but I don't love the idea. ("I want bigger numbers -> I'll use a bigger size" is pretty natural, which is how I came across this problem; and if the numbers are embedded in some other data structure, specifying another generator might be awkward.)
  • In smallDenominators, limit the size to some relatively small max.
  • It's possible to calculate the tree values without enumerating the tree. ~~I guess this would make it O(log n) time and memory (effectively O(1) because size is Int not Integer).~~ I expect this would be a lot faster and more memory efficient for large n. If it's too slow for small n, then "build the first _ tree values, calculate directly from then on" could work.

ChickenProp avatar Dec 31 '24 17:12 ChickenProp

I can only say that size argument in QC is very much ad-hoc and particular Arbitrary instance specific. Without "breaking" (i.e. significantly changing) existing code, you just need to know how particular instance is behaving. This is especially true for compositional instances. Try to generate [[[[Int]]]]], it will blow up memory even at small sizes. In particular you cannot hint "generate small lists with big ints" nor "generate small lists of big ints"; the size is not "budgeted", it's copied.

That said, the

In smallDenominators, limit the size to some relatively small max.

feels the best solution. I.e. fix the particular instance in ad-hoc way. It's ad-hoc instance (size-wise) anyway.

phadej avatar Jan 04 '25 07:01 phadej

Nod. I've opened #419 for this.

ChickenProp avatar Jan 06 '25 10:01 ChickenProp