ROCR-Runtime
ROCR-Runtime copied to clipboard
Is there a way to terminate the dispatched kernel?
Runtime is obviously able to track when a dispatched kernel completes (and decrement hsa_kernel_dispatch_packet_t::completion_signal
). Also seems when the process ends then all dispatched kernels for that process are terminated. Is there a way to terminate specific kernel dispatch on a specific queue?
What do you mean by terminate a kernel dispatch? Are you trying to cancel a kernel dispatch that was already enqueued?
yes
When the process ends, then the dispatched kernels are terminated because the process is being destroyed inside the Linux OS Kernel, and this causes the queues to be unmapped. But we cannot trigger the queue un-mapping from user space. The latest point at which you can cancel a queued dispatch will be right before the AQL packet gets read by CP firmware, which would be right before you set the packet header.
Actually it could be canceled by triggering an interrupt. Also by setting queue percentage to 0 stops/pauses any dispatches also probably by using interrupt. Seems causing an interrupt is the only way to do so and also the kernel is doing this when terminating the process.
User mode can stop queue execution (e.g. by changing the queue percentage to 0) and it can destroy queues. Both of those options stop execution of any dispatches that were written to the queue that have not completed yet. There is no way to stop execution of a specific dispatch. You can only stop entire queues.
Technically you could change a dispatch packet after submitting it to the queue. E.g. change it to a NOP. That would work if the dispatch hasn't started executing yet. But this would be illegal. Once the packet is submitted and the VALID bit is set in the packet header, the firmware owns the dispatch. There is no safe way for user mode to modify the packet after the fact. The GPU has every right to cache the contents of the packet in some internal buffers, and there is no way for user mode to tell when that has happened.
You can build a kill-switch into your shader kernel. E.g. some global variable that the kernel checks regularly and terminates itself if it is non-0.
There is no way to stop execution of a specific dispatch. You can only stop entire queues.
You can trigger an interrupt and in the secondary trap handler decide what to stop and what should continue.
Technically you could change a dispatch packet after submitting it to the queue. E.g. change it to a NOP.
Or fill memory with instructions with s_endpgm
and then flush caches by sending some PM4 packet.
You can build a kill-switch into your shader kernel. E.g. some global variable that the kernel checks regularly and terminates itself if it is non-0.
Of course, but there are situations where this is harder than other methods but this is probably the best solution anyway because interrupt is global for all CUs and invalidating caches is also global and thus both could be costly in specific situations.