cuda-api-wrappers
cuda-api-wrappers copied to clipboard
Consider replacing the stream_t::enqueue dummy object with a tag mechanism
At the moment, we enqueue using my_stream.enqueue.copy(...), my_stream.enqueue.kernel_launch(...), etc - where stream_t::enqueue is a dummy object which holds a reference to the stream. It helps us avoid having functions named stream_t::enqueue_kernel_launch(), stream_t::enqueue_kernel_copy(), enqueue_this and enqueue_that.
Well, we can do better: Just one templated enqueue function, with an initial argument being a tag class with different values for kernel launches, copies, etc. - every kind of possible operation. This can be specialized separately for the different operations, without even needing any special dispatching code.
... on second though, I'm not sure how I can make such a tagging mechanism not be much longer to invoke, i.e.
my_stream.enqueue(cuda::stream::work_item_t::kernel_launch, my_kernel_name, arg1, arg2,...)
yeah, that's pretty bad. And I can't bank on using a whole lot of stuff.