hypothesis
hypothesis copied to clipboard
Change the core draw primitive to be selecting an integer from 0 to n
From https://github.com/HypothesisWorks/hypothesis/pull/1961#issuecomment-491491638 by @DRMacIver:
Counter-proposal: How about instead of this, we change the fundamental design of Hypothesis in order to support this very specific use case.
*very straight face*
I'm actually serious. A thing I've been wondering about recently is moving from
draw_bits
to achoice(n)
function which just chooses a number between0
andn
. This would already move a bunch of our redundancy. If we then added an optionalweights
parameter to it, we could move sampler and weighted coin to just be thin shims over this method.This would in particular mean that we would solve this problem by removing the redundancy of representation in this case.
(I pulled this out because I think it's an interesting/good idea, and I want to be able to comment on it without junking up that other thread.)
I did some experimental work on refitting ConjectureData
to use _draw_bits
internally, instead of _draw_bytes
.
One thing I found is that some input functions naturally want to return raw bytes (e.g. sliced from an input buffer), some input functions want to return integers, and some input functions just want to generate random data in the usual way. (And some want to choose between these options depending on the circumstances.)
It might be useful to allow some flexibility in allowing the input function to return bytes, or an integer, or some sentinel that instructs the caller to supply random data. Then ConjectureData
would be responsible for detecting the type of returned data, and performing any necessary conversions in a central place.
Note: #1961 has lots more information that prospective implementors should read, from visualisations of the current data distribution to discussion of alternative approaches and why they won't work!
This could also allow symbolic execution like Crosshair to exercise Hypothesis tests - an early attempt based on bytes
wasn't impressive, but I think the concept could work well if applied at the level proposed by this issue. See https://github.com/pschanely/CrossHair/issues/45 for details.
Closing this in favor of https://github.com/HypothesisWorks/hypothesis/issues/3086#issuecomment-1774233444; we now think that using a tree of (int, float, bytes, str) leaves will be a better underlying data structure than just integers.