MOE
MOE copied to clipboard
[C++] store curandState object in CudaExpectedImprovementState
BLOCKING: #398
Currently CudaExpectedImprovementState stores a UniformRandomGenerator. The generator is called before each GetEI or GetGradEI call to generate a new seed for the GPU's RNG.
Instead, we should generate a single seed, create the curandState, and then use that for the rest of the CudaExpectedImprovementState's lifetime. This reduces complexity (don't have to worry about creating num_gpus generators with unique seeds) and slightly improves performance (not re-seeding the GPU RNG & creating curandState on every call).
We'll need a separate GPU endpoint to set the seed. Something like this: http://stackoverflow.com/questions/15247522/cuda-random-number-generation/15252202#15252202
Open question:
- Currently curandState is allocated on the stack. Can we device-malloc it or should it go in shared memory? If we device-alloc, should we copy into the register file?