arbitrary icon indicating copy to clipboard operation
arbitrary copied to clipboard

Can Arbitrary be biased?

Open ThomasdenH opened this issue 3 years ago • 2 comments

I have a question that might be clarified in the documentation. Is Arbitrary supposed to be exactly uniformly distributed? Or is it supposed to be quick? For example, I can generate a number between 0 and 10 by dividing the value of a u8 by 26. With extra computations, the distribution could be made uniform, but is that necessary or desirable?

ThomasdenH avatar Dec 28 '20 14:12 ThomasdenH

Arbitrary is not an RNG, so I don't think it makes much sense to impose uniformity requirements on its implementors. Instead most of the implementations will and do focus on generating data in a way that enables faster detection of faults in fuzzing and testing contexts.

nagisa avatar Dec 28 '20 14:12 nagisa

At the end of the day, this is a library for fuzzing and related activities. That use case is the priority.

That said, we certainly want uniformity (or at least an approximation) for things like Arbitrary for u32 so long as it doesn't have such a cost on fuzzing throughput that the overall fuzzing efficiency is lowered (which is vague and very specific to a particular fuzz target).

An example where we balance these things: https://github.com/rust-fuzz/arbitrary/blob/master/derive/src/lib.rs#L74-L76

fitzgen avatar Jan 08 '21 22:01 fitzgen

Is Arbitrary supposed to be exactly uniformly distributed?

Also note that it's hard to say what this means for dynamically-sized collections like Vec, or recursive data types. For the latter case, there's Boltzmann sampling, but that seems hard to implement.

langston-barrett avatar May 10 '24 14:05 langston-barrett