quickcheck icon indicating copy to clipboard operation
quickcheck copied to clipboard

Random generation is very slow

Open nh2 opened this issue 5 years ago • 6 comments

For testing my binding to the lz4 compression library (https://github.com/nh2/lz4-frame-conduit), I need to generate a lot of large ByteStrings.

Unfortunately I can barely use quickcheck for this, because its random generator is too slow.

  • random only generates only 5-8 MB/s (https://github.com/haskell/random/issues/51)
  • tf-random generates 3-8 MB/s in the same test

When actually generating ByteStrings using quickcheck-instances and size = 1000 on e.g.

QC.property $ \(bsList :: [ByteString]) -> QCM.monadicIO $ do
  QCM.run $ hPutStrLn stderr (show $ sum $ map BS.length bsList)
  QCM.assert True

it generates me 1 MB/s; generating [String] is only slightly faster.

These numbers are very low compared to the 230 MB/s /dev/urandom gets me on the same machine.

Can we do something about it?

nh2 avatar Nov 25 '18 03:11 nh2

Something is weird; when I last time benchmarked tf-random was 5-10 time faster than random. How you generate your ByteStrings?

quickcheck-instances way of generating is not optimal, but nobody mentioned it's too slow for their needs (it generates list of Word8s, and not Word32 which tf-random generates natively; so it's at least 4x slower than it could be).

phadej avatar Nov 25 '18 11:11 phadej

How you generate your ByteStrings?

Using quickcheck-instances, as you said.

I'm now trying out to work around this slowness by using pcg-random (with a seed and target length generated by QuickCheck's choose) to generate the ByteStrings. I've measured that this works at ~300 MB/s, even faster than /dev/urandom. But of course using this workaround isn't great.

Something is weird; when I last time benchmarked tf-random was 5-10 time faster than random.

Hmm, let me give that a "quick check".

nh2 avatar Nov 25 '18 16:11 nh2

@nh2 could you try this https://github.com/phadej/qc-instances/compare/master...phadej:faster-bytestring version of quickcheck-instances, how much faster than released it is in your case (and how much slower than pcg-random) ?

In the simple benchmark there, it was already 20x faster.

(I'll resort to crreateAndTrim next)

phadej avatar Nov 25 '18 18:11 phadej

I think that the best place to fix this particular problem is in the Arbitrary ByteString instance (e.g. using @phadej's patch).

More generally, there are a couple of things that might cause Gen to be slow:

  • Gen uses splitting to distribute random numbers throughout the computation, which may be slower than generating random numbers in the traditional way. The fix to this would be to change Gen into a state monad, but I don't want to do that as it would no longer be possible to generate infinite structures.
  • When I looked at the Core generated by GHC a while ago, it seemed rather suboptimal. I seem to remember a lot of combinators were not getting specialised as one would hope. I've been meaning to look more into this but not got round to it yet.

nick8325 avatar Nov 28 '18 10:11 nick8325

@nick8325

there are a couple of things that might cause Gen to be slow

Just a thought, but generating infinite structures only works when the bottom of the monad stack is Identity, right? If so, perhaps it's worth considering StateT over splitting(?)

moodmosaic avatar Nov 29 '18 22:11 moodmosaic

https://hackage.haskell.org/package/quickcheck-instances-0.3.20 released with faster ByteString generator

phadej avatar Mar 25 '19 23:03 phadej

Seeing as this was fixed in quickcheck-isntances I'm going to close this.

MaximilianAlgehed avatar Mar 21 '24 12:03 MaximilianAlgehed