occa
occa copied to clipboard
make OCCA API multithread-safe for kernel calls
Please modify the OCCA API so that user calls to OCCA kernels (within user application code) are multithread-safe.
Right now, the calling arguments are stored in class modeKernel_t
member arguments
. Trying to call the same kernel instance from another user thread therefore is a problem.
Background: In my single threaded application code, I find that re-using a kernel instance is desirable for performance and it is safe to do so. But when I tried to multi-thread my application code, where more than one thread can then access the same kernel instance (root of problem), then crashes happen within botched frames deep in the OCCA API code . Naturally, my workaround is to have a separate instance of a given kernel in each application thread, or (maybe best) just call the kernels within a critical section to serialize invocations of the OCCA API code. But if the OCCA API were multithread-safe, such machinations would not be necessary nor would other folks who try to multi-thread get a surprise.
Hi @pdhahn!
Making the kernel calls not thread-safe was a design decision which trades thread-safety for ease of development/use with C and other languages.
In order to call the kernel from other languages, we need to take an arbitrary number of arguments and invoke the underlying C++ kernel. This is complex if we want to transform and pass all arguments at once in an external language's C binding.
We could add a .clone()
method to the core objects and have each thread own its own kernel instance. How does that sound?
How about implementing a named critical section which wraps the body contents within each of the kernel operator templates in kernelOperators.cpp? ( I use Boehm's atomics package to implement named critical sections myself. )
Question: Do you think I can I subclass the occa::kernel class and just do the named critical section thing myself?
Regarding the kernelOperators.cpp
suggestion, this route is not used in the C API which is the main reason why we can't easily make this thread safe. We push the arguments to the kernel one by one and then launch it.
As for the subclass approach, this will fail because of how the arguments are stored prior to launch.
OK. Is the clone()
possibility you broached a convenience for duplicating an instance of the occa::kernel class then?
Yeah, the only options I see are:
- Manually make a lock for launching a kernel
- Creating an additional kernel per thread
The latter is probably the best approach to avoid complication and performance hits. We can add a .clone()
method to make this simple. Same with devices to create individual streams, and memory for a quick memory copy.
You mean the user needs to call clone() and explicitly manage object instances for the threads? Or are you thinking about something hidden and automatic in the OCCA API somehow?
User explicitly handles kernel instances in multi-threaded codes
OK then please implement clone()
, it will at least be helpful. Thank you!