incorrect distribution of randomR for floating-point numbers
Hello,
It appears that when the upper bound is very small, floating-point values generated are not correctly distributed.
For example, when the upper bound is 1.0e-45::Float which is the smallest subnormal number of single precision floating-point representation, randomR(0, (1.0e-45::Float)) does not produce 1.0e-45. Furthermore, if the upper bound is 4.0e-45::Float, 6.0e-45 is generated as follows,
filter (>4.0e-45) $ take 10 $ randomRs (0,(4.0e-45::Float)) $ mkStdGen 0
[6.0e-45,6.0e-45]
Yup. The current / usual naive approach to generating a unit interval doesn’t actually work well.
But: most / all unit interval algorithms are half open, they give the interval [0,1) and then things get rescaled , so you should never expect the 1 in your interval unless it’s from rounding...
- the over shoot is definitely looking like a bug, I’m not sure if it’s a bug in our understanding of floating point near the cut off of normalized floats one. What’s the size of an ULP at that scale? I’d love some help understanding what’s the cause of this bug.
Great job finding / reporting this stuff. Thank you for taking the time.
On Mon, Apr 1, 2019 at 7:02 PM Shaobo [email protected] wrote:
Hello,
It appears that when the upper bound is very small, floating-point values generated are not correctly distributed.
For example, when the upper bound is 1.0e-45::Float which is the smallest subnormal number of single precision floating-point representation, randomR(0, (1.0e-45::Float)) does not produce 1.0e-45. Furthermore, if the upper bound is 4.0e-45::Float, 6.0e-45 is generated as follows,
filter (>4.0e-45) $ take 10 $ randomRs (0,(4.0e-45::Float)) $ mkStdGen 0 [6.0e-45,6.0e-45]
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haskell/random/issues/53, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwjf78XD4lTeVPJLdIdGldWAlV_Asks5vcpANgaJpZM4cWzgv .
I'm not sure how ULP is defined for subnormals but they are multiple of the minimum subnormal number. Let's call it MSN. 1.0e-45 is one MSN and 4.0e-45 is three MSNs. The number is generate using the following formula,
let (coef,g') = random g in
(2.0 * (0.5*l + coef * (0.5*h - 0.5*l)), g')
When the lower bound is 0, it becomes, 2.0*(coef*(0.5*h)). And when h is 3 MSNs, 0.5*h rounds to 2 MSNs. So when coef is greater than 0.75, coef*(0.5*h) rounds to 2 MSNs as well, making the entire expression evaluate to 4 MSNs, which is 4.0e-45.
There are also other values that could trigger values being greater than upper bound. I used sbv (https://hackage.haskell.org/package/sbv) to find counterexamples. It works pretty well.
In version 1.2.0 (released yesterday), we decided not to attempt to give stronger guarantees for floating point numbers. This decision was taken after a long discussion, see the summary here: https://github.com/idontgetoutmuch/random/issues/113#issuecomment-624041080.
What is new in 1.2.0 is that these issues are now documented: https://hackage.haskell.org/package/random-1.2.0/docs/System-Random-Stateful.html#g:14 -- I also used sbv to find these counterexamples btw :)
The decision not to tackle floating point guarantees was a pragmatic one: a lot of improvements had been made, and we didn't want to delay the release. Generating uniformly random floating point numbers is surprisingly tricky, and we may pick this up again in the future. In the meantime, I am exploring this topic in a little separate experiment.