arbitrary
arbitrary copied to clipboard
Can Arbitrary be biased?
I have a question that might be clarified in the documentation. Is Arbitrary supposed to be exactly uniformly distributed? Or is it supposed to be quick? For example, I can generate a number between 0 and 10 by dividing the value of a u8
by 26. With extra computations, the distribution could be made uniform, but is that necessary or desirable?
Arbitrary
is not an RNG, so I don't think it makes much sense to impose uniformity requirements on its implementors. Instead most of the implementations will and do focus on generating data in a way that enables faster detection of faults in fuzzing and testing contexts.
At the end of the day, this is a library for fuzzing and related activities. That use case is the priority.
That said, we certainly want uniformity (or at least an approximation) for things like Arbitrary for u32
so long as it doesn't have such a cost on fuzzing throughput that the overall fuzzing efficiency is lowered (which is vague and very specific to a particular fuzz target).
An example where we balance these things: https://github.com/rust-fuzz/arbitrary/blob/master/derive/src/lib.rs#L74-L76
Is Arbitrary supposed to be exactly uniformly distributed?
Also note that it's hard to say what this means for dynamically-sized collections like Vec
, or recursive data types. For the latter case, there's Boltzmann sampling, but that seems hard to implement.