optimized uniform small integers?
I've been using my own wrapper to generate uniform random integers of small specific bit widths. The idea is to cache the output of a block of random_uintegers and then demux into a stream of small width values. For example, you might have a stream of 1-bit (0/1) values. The output array is any integer dtype.
I just did some benchmarking comparing my approach for 1-bit values compared to random_int, and see a quite large speedup.
Maybe something similar could be useful for randomstate?
The hard part with this approach is the reproducibility requires storing any unused bits. This adds to state which I think is more-or-less a no-no. For example, the BM gaussian in NumPy is a bit annoying to carry around.
I'm a little surprised that 1-bit values are slow -- they all generate 32 or 64 values from a single draw of the prng and store these in arrays where each element is 8 bits.
The better approach is to use randomgen which explicitly produces a set of basicRNGs that were designed to be easily incorporated into user Cython or Numba code.
My bench: from randomgen import RandomGenerator, Xoroshiro128 rs = RandomGenerator (Xoroshiro128 (0))
from pn2 import pn64 p1 = pn64 (rs, 1) from timeit import timeit
print (timeit('p1(1000)', globals=globals(), number=1000000)) print (timeit('rs.randint(0, 2, size=1000)', globals=globals(), number=1000000))
result; 3.2882123820018023 13.305304736997641
My 'pn64' class is c++ code that calls random_uintegers via python interface and caches a block of results, then gives them out into a stream M-bits at a time.
I could potentially call c code directly instead of going through python interface (if I knew how)
BTW, I am using randomgen, guess I should have filed this issue there
You need to set the dtype for randint to get the best performance. Np.bool
On Mon, Jun 25, 2018, 19:46 ndbecker [email protected] wrote:
My bench: from randomgen import RandomGenerator, Xoroshiro128 rs = RandomGenerator (Xoroshiro128 (0))
from pn2 import pn64 p1 = pn64 (rs, 1) from timeit import timeit
print (timeit('p1(1000)', globals=globals(), number=1000000)) print (timeit('rs.randint(0, 2, size=1000)', globals=globals(), number=1000000))
result; 3.2882123820018023 13.305304736997641
My 'pn64' class is c++ code that calls random_uintegers via python interface and caches a block of results, then gives them out into a stream M-bits at a time.
I could potentially call c code directly instead of going through python interface (if I knew how)
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/bashtage/ng-numpy-randomstate/issues/124#issuecomment-400055505, or mute the thread https://github.com/notifications/unsubscribe-auth/AFU5ReI5WCNxXd1X6E9hJw7b2izBod8rks5uATASgaJpZM4U2gyM .
Hmm, setting np.bool doesn't shave off much time. I suspect that most of the time is in allocating the memory for the output array.