hypothesis icon indicating copy to clipboard operation
hypothesis copied to clipboard

Change the core draw primitive to be selecting an integer from 0 to n

Open Zalathar opened this issue 5 years ago • 4 comments

From https://github.com/HypothesisWorks/hypothesis/pull/1961#issuecomment-491491638 by @DRMacIver:

Counter-proposal: How about instead of this, we change the fundamental design of Hypothesis in order to support this very specific use case. *very straight face*

I'm actually serious. A thing I've been wondering about recently is moving from draw_bits to a choice(n) function which just chooses a number between 0 and n. This would already move a bunch of our redundancy. If we then added an optional weights parameter to it, we could move sampler and weighted coin to just be thin shims over this method.

This would in particular mean that we would solve this problem by removing the redundancy of representation in this case.

Zalathar avatar May 19 '19 06:05 Zalathar

(I pulled this out because I think it's an interesting/good idea, and I want to be able to comment on it without junking up that other thread.)

Zalathar avatar May 19 '19 06:05 Zalathar

I did some experimental work on refitting ConjectureData to use _draw_bits internally, instead of _draw_bytes.

One thing I found is that some input functions naturally want to return raw bytes (e.g. sliced from an input buffer), some input functions want to return integers, and some input functions just want to generate random data in the usual way. (And some want to choose between these options depending on the circumstances.)

It might be useful to allow some flexibility in allowing the input function to return bytes, or an integer, or some sentinel that instructs the caller to supply random data. Then ConjectureData would be responsible for detecting the type of returned data, and performing any necessary conversions in a central place.

Zalathar avatar May 19 '19 06:05 Zalathar

Note: #1961 has lots more information that prospective implementors should read, from visualisations of the current data distribution to discussion of alternative approaches and why they won't work!

Zac-HD avatar Sep 06 '19 12:09 Zac-HD

This could also allow symbolic execution like Crosshair to exercise Hypothesis tests - an early attempt based on bytes wasn't impressive, but I think the concept could work well if applied at the level proposed by this issue. See https://github.com/pschanely/CrossHair/issues/45 for details.

Zac-HD avatar Nov 18 '20 23:11 Zac-HD

Closing this in favor of https://github.com/HypothesisWorks/hypothesis/issues/3086#issuecomment-1774233444; we now think that using a tree of (int, float, bytes, str) leaves will be a better underlying data structure than just integers.

Zac-HD avatar Oct 22 '23 23:10 Zac-HD