haskell-hedgehog
haskell-hedgehog copied to clipboard
Document the magic number 100
I'm puzzled by the behavior of linear
.
Construct a range which scales the second bound relative to the size parameter.
The first example in the doc is bounds 50 $ linear 0 10
. I'd expect you scale 10 by 50 and you get (0,500)
but instead you get (0,5)
.
The other examples seem to confirm that there's an implicit factor of 1/100
multiplied in there somewhere.
And then when the size parameter is above 100, it stops scaling entirely and becomes a constant.
λ> Range.bounds 5000 $ Range.linear 0 10
(0,10)
So what's the concept here? Is the size parameter in general supposed to be thought of as a percentage, and should it never go above 100?
I also just noticed that the size parameter used by sample
is 30 - Is there a particular reason for that?
This concept is inherited from QuickCheck and I haven't thought much about whether it could be made better. It does make sense to think of the size parameter as a percentage, it ranges from 0 - 99
, so if you run 200 tests it will cycle back to 0
, then ramp up to 99
again.
I think the general idea is that if the test fails with an input which is initially smaller, then shrinking it is easier, so we should start with inputs that are small. Also, smaller inputs tend to run faster, so only generating a few big inputs helps test performance.
I'm not sure any of these are great reasons for having size as a thing. If we had perfect shrinking every time then I think the concept of size could potentially be removed altogether.
Ah. That makes sense. I always thought size was an underdocumented concept in quickcheck.
Does it make sense to make this explicit and have the Size
type constrain its value to the valid range?
If it's a percentage, wouldn't it go to 100?
Is there a reason for it to be integral rather than fractional? A fractional value between 0 and 1 would make more sense to me, rather than hard-coding a maximum precision of "99 increments".
Does it make sense to make this explicit and have the
Size
type constrain its value to the valid range?
Yes but it needs some thought to make sure that doesn't break anything.
If it's a percentage, wouldn't it go to 100?
I guess 0-99 makes more sense as that is exactly 100 levels and the default number of tests is 100.
Is there a reason for it to be integral rather than fractional?
Only that it matches what QuickCheck does, maybe being 0 - 1 as a fractional would be better, I'm not sure.
This concept is inherited from QuickCheck and I haven't thought much about whether it could be made better.
Size can be confusing because there are some generators which totally ignore it. I'm wondering if the range combinators could help removing size entirely at some point.
I just noticed that int64
on an exponential range produces very bad results for sizes 100 and up.
λ> bounds 100 (exponentialBounded :: Range Int64)
(-9223372036854775808,-9223372036854775808)
λ> sample (resize 100 (int64 (exponential 0 maxBound)))
0
So there's another motivation to constrain Size
to its expected range, I think.
Apart from the idea of constraining the Size
which indeed looks interesting and useful, this looks weird:
λ> sample (resize 100 (int64 (exponential 0 maxBound))) 0
I think I would expect a number close to maxBound :: Int64
, but not 0
.
And so this also looks weird:
λ (\x -> bounds x $ exponential 0 (maxBound :: Int64)) <$> [97 .. 101]
[(0,3817338227552192512),(0,5933694520551410688),(0,0),(0,0),(0,0)]
as I would expect to see something like
[(0,3817338227552192512),(0,5933694520551410688),(0,9223372036854775807),(0,9223372036854775807),(0,9223372036854775807)]
because
λ (\x -> bounds x $ exponential 0 (maxBound :: Int8)) <$> [97 .. 101]
[(0,115),(0,121),(0,127),(0,127),(0,127)]
λ (\x -> bounds x $ exponential 0 (maxBound :: Int16)) <$> [97 .. 101]
[(0,26559),(0,29500),(0,32767),(0,32767),(0,32767)]
λ (\x -> bounds x $ exponential 0 (maxBound :: Int32)) <$> [97 .. 101]
[(0,1391252730),(0,1728494283),(0,2147483647),(0,2147483647),(0,2147483647)]
this looks weird
It is weird because when I wrote the code for exponential ranges, I didn't consider sizes above 99 at all. My understanding was that size 100 was an invalid input, and so its behavior could be undefined. I'm still not entirely sure whether this understanding was correct.
Another reason for size to be constrained: a negative size results in confusing results for those who have not figured out it is only valid for 0-99. Documentation could help in the meantime.
Range.bounds (-2) (Range.linear (-10) 10)
(-10, -10)
is the same as:
Range.bounds (-1) (Range.linear (-10) 10)
(-10, -10)