PyBHV
PyBHV copied to clipboard
Native Random implementation prohibits Parallel Execution
Because there is no exposed seeding and rand/random functions don't take seeds, it's hard to write parallel code.
One solution would be to generate an initial carefully chosen random state for each potential thread. Then, using templates, the current code still works, but you can optionally specify a thread number as a template parameter to generate independent hypervectors. Any thoughts on this @luketpeterson ?
I don't understand the problem / requirements. You can use any number of values, such as a thread_id that's incremented each time a new thread is created, as a seed. But there must be more constraints that I'm not seeing.
In the past we (at Caustic) generated huge tables of pseudo-randomness in advance, but that is because we wanted a guaranteed correlation regardless of which core a given part of the workload ended up on. It doesn't sound like that's the aim here.
I want to avoid bothering the user with this. At the moment, a random vector has just one identity: the pointer you have to the newly generated vector. When you need to provide a seed to generate a vector, there are two identifiers, and you're responsible for lining them up, and worse, for safely coming up with new ones.
You could also drop the pointer identity, and make a random seed indexed namespace, which could be very convenient in the long run, but is well outside of what this library should provide.
The bottom line is thread_id still needs to be mixed in a counter safely which has to be whitened to a seed, which is then used to initialize the state of the RNG. While if we generate seeds or initial states for thread_number RNGs at the start, we never need to worry about the other steps during execution.
This solution has its problems too. While this works trivially for spawning N long lived workers, if a thread can spawn it's open subthreads you again are loaded with a task of mapping, from thread IDs or inheritance traces to the flat array of RNGs.
But in theory if omp_get_num_threads is smaller than N, the number of states we generated, we can just use omp_get_thread_num to index in the array.
See https://github.com/Adam-Vandervorst/PyBHV/tree/parallel-opt