pika
pika copied to clipboard
Test if CUDA callbacks would again be a viable replacement for polling
The event polling has been successful and turned out to perform significantly better than using CUDA callbacks. However, that was tested when the CUDA callbacks still required runtime registration on the CUDA thread. We should check:
- if using plain CUDA callbacks would again be a competitive option to event polling in the scheduler, or
- if the former does not work well enough if a separate polling thread would work well enough. Either of these would be beneficial architecturally because they would decouple the CUDA senders from the schedulers. Related: #17.