accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

PRNGs/low-discrepancy sequences on the device?

Open currymj opened this issue 7 years ago • 7 comments

How hard would it be to implement a library to generate pseudorandom numbers or values from a low-discrepancy sequence on the device? Ideally you'd have have each Exp getting its own values in parallel and independent from the others. Even more ideally, without the hack you often see in C++ CUDA code of just pre-evaluating and tossing a bunch of PRNG values to get to a different starting point for each thread.

Despite my name, I'm by no means an experienced Haskell programmer, but this would be valuable enough for me that I would be willing to give it a shot, if it's not a totally quixotic task. I just have no idea what a good design would be for the types.

I think it would be generally useful; Monte Carlo computations are definitely one of the more common uses for parallelism.

edit: I see #52, which I guess has been sidelined because random numbers on the host is more flexible and good enough for most purposes.

currymj avatar Apr 13 '17 17:04 currymj

Yep, #52 has some notes. Random number generation in Acc land would be good though, especially, for the GPU. The mwc-random-accelerate package, which does the generation on the host using mwc-random, was just easier to setup to get something out the door.

I guess that the current mwc-random-accelerate isn't very good for your purposes though, since you'll need many random numbers? What kind of interface do you think would be suitable for your work?

tmcdonell avatar Apr 14 '17 02:04 tmcdonell

I mean, Haskell RNGs normally live inside IO because they are inherently stateful, right? That's the tricky part where i don't know what things would look like.

Ideally there would be a function call to give you a random Exp whatever, and the system would magically ensure that the many parallel Exp computations don't have correlated random numbers.

Ultimately it seems like there are a few options for what really happens:

  1. Currently, generate on host
  2. Pre-generate whole arrays on device then index them from kernels (rng function returns Acc)
  3. Generate individual random numbers within the kernel. (rng function returns Exp?)

Ideally you'd be able to do 3 which also gives you 2 if you want it. But even just 2 would probably be good.

On April 13, 2017 at 10:36:01 PM, Trevor L. McDonell ([email protected](mailto:[email protected])) wrote:

Yep, #52(https://github.com/AccelerateHS/accelerate/issues/52) has some notes. Random number generation in Acc land would be good though, especially, for the GPU. The mwc-random-accelerate package, which does the generation on the host using mwc-random, was just easier to setup to get something out the door.

I guess that the current mwc-random-accelerate isn't very good for your purposes though, since you'll need many random numbers? What kind of interface do you think would be suitable for your work?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub(https://github.com/AccelerateHS/accelerate/issues/374#issuecomment-294071471), or mute the thread(https://github.com/notifications/unsubscribe-auth/ADoGbJ5qoAXApTbUxJFpm57lxfUz4Rd1ks5rvtuRgaJpZM4M89_8).

currymj avatar Apr 14 '17 04:04 currymj

I am probably being naive but would something like this do?

z :: Acc (Vector (Int, (Double, Double)))
z = generate (index1 384) (\n -> lift (unindex1 n + 1, (0.0 :: Exp Double, 0.5 :: Exp Double)))

bsd2 :: Exp (Int, (Double, Double)) -> Exp (Int, (Double, Double))
bsd2 n = let
  m = (A.fst n * 1103515245 + 12345) `mod` 2^(lift (31 :: Int))
  y = A.fromIntegral (A.fst n) + A.fst (A.snd n)
  x = y / 2147483648.0
  in
  lift $
  (m, (y, x))

main = mapM_ putStrLn $
       Prelude.map show $
       Prelude.map Prelude.snd $
       Prelude.map Prelude.snd $
       toList $ GPU.run $ A.map (A.iterate 100000 bsd2) z

idontgetoutmuch avatar Apr 14 '17 09:04 idontgetoutmuch

Thanks, this is very solid! Solves my immediate problem for sure.

I still would be inclined to work on an actual well-designed package with different PRNGs, low-discrepancy sequences, etc. Any thoughts on what a nice, Haskell-y design for that would look like are appreciated.

On April 14, 2017 at 5:59:36 AM, idontgetoutmuch ([email protected](mailto:[email protected])) wrote:

I am probably being naive but would something like this do?

z :: Acc (Vector (Int, (Double, Double))) z = generate (index1 384) (\n -> lift (unindex1 n + 1, (0.0 :: Exp Double, 0.5 :: Exp Double))) bsd2 :: Exp (Int, (Double, Double)) -> Exp (Int, (Double, Double)) bsd2 n = let m = (A.fst n * 1103515245(tel:1103515245) + 12345) mod 2^(lift (31 :: Int)) y = A.fromIntegral (A.fst n) + A.fst (A.snd n) x = y / 2147483648.0 in lift $ (m, (y, x)) main = mapM_ putStrLn $ Prelude.map show $ Prelude.map Prelude.snd $ Prelude.map Prelude.snd $ toList $ GPU.run $ A.map (A.iterate 100000 bsd2) z

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub(https://github.com/AccelerateHS/accelerate/issues/374#issuecomment-294128791), or mute the thread(https://github.com/notifications/unsubscribe-auth/ADoGbCzP7dtPvdbFgrTlOXj90R2Yje9Jks5rv0OIgaJpZM4M89_8).

currymj avatar Apr 14 '17 11:04 currymj

I'm not sure I understand. I think it would be easy enough to implement something like Mersenne Twister in accelerate. I just implemented the above as a demo. There is a standard interface in Haskell to RNGs: effectively next and split. next is easy but split is more difficult. Above I used different seeds to generate parallel computations but there is no guarantee that these are uncorrelated. Probably Guy Steele's split mix is what you (we) need since I am guessing you don't need crypto strength random numbers: http://on-demand.gputechconf.com/gtc/2016/presentation/s6665-guy-steele-fast-splittable.pdf. At the moment I just want to simulate some biological reactions which currently take 10 hours to run. If I can do this in 1/384 th of the time on the GPU I will be very happy.

idontgetoutmuch avatar Apr 14 '17 14:04 idontgetoutmuch

Good discussion on generating random numbers at #412

tmcdonell avatar Sep 01 '20 10:09 tmcdonell

I have created the sfc-random-accelerate package. I have done no analysis to check how good this is when evaluated in parallel, but reading the description from where I stole the generator, it seems like it should be fine for (non-crypto) purposes (assuming you seed it well).

tmcdonell avatar Sep 03 '20 08:09 tmcdonell