haskell-hedgehog icon indicating copy to clipboard operation
haskell-hedgehog copied to clipboard

frequency is inefficient and wrong

Open treeowl opened this issue 6 years ago • 3 comments
trafficstars

Correctness

frequency is a little bit wrong because it relies on the sum of the frequencies being representable by an Int, but does not document this. So frequency [(maxBound `quot` 2, x), ((maxBound `quot` 4) * 3, y), (2034, z)] will do something hard to predict. The best fix is likely to use double-word integers in the calculations:

data DW = DW {hi :: !Word, lo :: !Word}

This is almost certainly available in a library somewhere, but it wouldn't be too hard to roll our own if necessary.

Efficiency

The argument list is traversed for every value generated. When the list is long, that's really lousy. The solution is to build a tree representing the input list, making lookup much faster. A simple hack using DW above would use an IntMap of IntMaps and some lookup functions like lookupLE, lookupGE, etc. There are almost certainly better specialized data structures available for the purpose, but I don't know if they'd be worth the trouble.

treeowl avatar Nov 11 '19 01:11 treeowl

Another option to deal with correctness is to document the issue and throw an error when overflow is detected. A third is to change the type to use Int32 (or better, Word32) in the input, and Word64 internally.

treeowl avatar Nov 11 '19 02:11 treeowl

The big advantage of sticking with Word64 internally, instead of a double word, is that things should be considerably simpler than otherwise.

treeowl avatar Nov 11 '19 02:11 treeowl

fwiw frequency and its semantics are copied from QuickCheck

jacobstanley avatar Nov 14 '19 16:11 jacobstanley