packed_simd icon indicating copy to clipboard operation
packed_simd copied to clipboard

provide a better random "vector" generator in the aobench benchmark

Open gnzlbg opened this issue 7 years ago • 14 comments

@TheIronBorn suggested that the random number generator that we are using for packed vectors in the aobench benchmark is too naive.

It would be great if we could use a better random number generator there for the 128-bit and 256-bit cases and put it behind a feature flag.

gnzlbg avatar Aug 16 '18 15:08 gnzlbg

@TheIronBorn now that the aobench benchmark matches ISPC performance its time to beat it. Would you be willing to look into this or could you give some hints here about which PRNG would make sense to use for that benchmark ?

gnzlbg avatar Sep 04 '18 08:09 gnzlbg

I have a collection of various SIMD PRNGs here: https://github.com/TheIronBorn/simd_prngs.

What are the requirements for quality/hardware?

TheIronBorn avatar Sep 04 '18 17:09 TheIronBorn

Currently we use LFSR113, so anything that's not worse quality-wise, but has higher throughput. Hardware-wise, we do conditional compilation behind the 256bit feature flag (cfg(feature = "256bit")) to select between 128-bit wide and 256-bit wide implementations at compile-time, so we can provide two different PRNGs, one for each.

I guess we could technically also expose a 512-bit wide flag, but that is not implemented yet.

Also, we can add the simd_prngs crate as an optional dependency - there is no need to re-implement anything here AFAICT.

gnzlbg avatar Sep 04 '18 17:09 gnzlbg

Do you know if the ISPC LFSR implementation avoids correlation between random streams?

TheIronBorn avatar Sep 04 '18 17:09 TheIronBorn

No idea, I adapted it 1:1 into the aobench example here: https://github.com/rust-lang-nursery/packed_simd/blob/master/examples/aobench/src/random.rs

The ISPC documentation mentions the following:

Note that if the same varying seed value is used for all of the program instances (e.g. RNGState state; seed_rng(&state, 1);), then all of the program instances in the gang will see the same sequence of pseudo-random numbers.

So I would think that no, it does not avoid it.

gnzlbg avatar Sep 04 '18 18:09 gnzlbg

Sounds like that's a no. So correlation protection would be an improvement we could provide, but that means a narrower selection of PRNGs.

TheIronBorn avatar Sep 04 '18 18:09 TheIronBorn

LFSR113 has 6 BigCrush failures: http://www.iro.umontreal.ca/~lecuyer/myftp/papers/testu01.pdf

TheIronBorn avatar Sep 04 '18 18:09 TheIronBorn

So IIRC the current aobench implementation uses a different random seed for each vector lane, so I don't know how much of an improvement "correlation protection" would bring. I don't know how would a higher-quality random number generator affect the ambient occlusion benchmark, maybe it would generate better images in corner cases ?

gnzlbg avatar Sep 04 '18 18:09 gnzlbg

I don't see it being much of a problem as long as we have a large enough statespace respective to stream length. How many random numbers are used per stream?

TheIronBorn avatar Sep 04 '18 18:09 TheIronBorn

Since we can only rely on vector-width in regards to hardware, AES-NI, vector shuffles, rotations, etc won't be useful

TheIronBorn avatar Sep 04 '18 18:09 TheIronBorn

How many random numbers are used per stream?

So in total we generate ~ width*height*nsubsamples*nsubsamples*ntheta*nphi random numbers, and per stream that number divided by the number of streams.

For the benchmark cases, we generate 800 x 600 images, with 2 sub-samples per-pixel per-dir, and 8x8 = 64 rays for different thetas and phi, that's: 800 * 600 * 2 * 2* 8 * 8 = 16 million random numbers. When we use 128-bit wide vectors, we have 4 streams of 4 f32s, so that would be 16 million / 4 ~= 4 million random numbers per stream.

Since we can only rely on vector-width in regards to hardware, AES-NI, vector shuffles, rotations, etc won't be useful

This is an artificial limitation that we currently have, but we can lift it. Currently, using cfg(target_feature = ) at compile-time we can rely on whatever we want. I don't know what's however worth it.

gnzlbg avatar Sep 04 '18 18:09 gnzlbg

With a stream length of 4 million, the chance of correlation is ~2e-31 for a period of 2^128

TheIronBorn avatar Sep 04 '18 18:09 TheIronBorn

Also: for chaotic PRNGs like Jenkin's smallprng (JSF) or PractRand's SFC, the chance of a cycle shorter than 4 million is ~2^-(state_bits - 22) -http://www.pcg-random.org/posts/random-invertible-mapping-statistics.html

For Jsf32: ~2^-106 Sfc uses a counter to raise the minimum cycle length to 2^32/64

TheIronBorn avatar Sep 11 '18 23:09 TheIronBorn

There is also: https://github.com/termoshtt/rust-sfmt

gnzlbg avatar Nov 11 '18 10:11 gnzlbg