occa icon indicating copy to clipboard operation
occa copied to clipboard

make OCCA API multithread-safe for kernel calls

Open pdhahn opened this issue 5 years ago • 9 comments

Please modify the OCCA API so that user calls to OCCA kernels (within user application code) are multithread-safe.

Right now, the calling arguments are stored in class modeKernel_t member arguments. Trying to call the same kernel instance from another user thread therefore is a problem.

Background: In my single threaded application code, I find that re-using a kernel instance is desirable for performance and it is safe to do so. But when I tried to multi-thread my application code, where more than one thread can then access the same kernel instance (root of problem), then crashes happen within botched frames deep in the OCCA API code . Naturally, my workaround is to have a separate instance of a given kernel in each application thread, or (maybe best) just call the kernels within a critical section to serialize invocations of the OCCA API code. But if the OCCA API were multithread-safe, such machinations would not be necessary nor would other folks who try to multi-thread get a surprise.

pdhahn avatar May 14 '19 22:05 pdhahn

Hi @pdhahn!

Making the kernel calls not thread-safe was a design decision which trades thread-safety for ease of development/use with C and other languages.

In order to call the kernel from other languages, we need to take an arbitrary number of arguments and invoke the underlying C++ kernel. This is complex if we want to transform and pass all arguments at once in an external language's C binding.

We could add a .clone() method to the core objects and have each thread own its own kernel instance. How does that sound?

dmed256 avatar May 14 '19 23:05 dmed256

How about implementing a named critical section which wraps the body contents within each of the kernel operator templates in kernelOperators.cpp? ( I use Boehm's atomics package to implement named critical sections myself. )

pdhahn avatar May 15 '19 00:05 pdhahn

Question: Do you think I can I subclass the occa::kernel class and just do the named critical section thing myself?

pdhahn avatar May 15 '19 00:05 pdhahn

Regarding the kernelOperators.cpp suggestion, this route is not used in the C API which is the main reason why we can't easily make this thread safe. We push the arguments to the kernel one by one and then launch it.

As for the subclass approach, this will fail because of how the arguments are stored prior to launch.

dmed256 avatar May 15 '19 00:05 dmed256

OK. Is the clone() possibility you broached a convenience for duplicating an instance of the occa::kernel class then?

pdhahn avatar May 15 '19 00:05 pdhahn

Yeah, the only options I see are:

  • Manually make a lock for launching a kernel
  • Creating an additional kernel per thread

The latter is probably the best approach to avoid complication and performance hits. We can add a .clone() method to make this simple. Same with devices to create individual streams, and memory for a quick memory copy.

dmed256 avatar May 15 '19 00:05 dmed256

You mean the user needs to call clone() and explicitly manage object instances for the threads? Or are you thinking about something hidden and automatic in the OCCA API somehow?

pdhahn avatar May 15 '19 00:05 pdhahn

User explicitly handles kernel instances in multi-threaded codes

dmed256 avatar May 15 '19 00:05 dmed256

OK then please implement clone(), it will at least be helpful. Thank you!

pdhahn avatar May 15 '19 01:05 pdhahn