HIP abort ROCm vs CUDA
HIP has a device-side abort() function that is emulated with asm("trap;"); on the CUDA platform - see https://github.com/ROCm-Developer-Tools/HIP/issues/233
However, the behavior of abort() in device code is fundamentally different on the ROCm and CUDA platforms: abort() in CUDA terminates just the kernel in which it is executed, but on ROCm it terminates the whole program. Can you confirm if this is the intended behavior or if it is a bug? Also it would be nice if device-side abort() was documented in the Kernel Language Syntax.
Regardless of the intention, is it possible to catch the kernel failure in HIP/ROCm similarly to the behavior of abort() in CUDA? We would like to have a custom abort handler that prints an appropriate diagnostics, backtrace etc.
Thanks for reporting this. Regarding behavior difference, will put this up for discussion internally, see what folks think of this. Regarding missing documentation for abort, will pass on to the team to get it documented.
Hi @lahwaacz, it was decided to maintain the current behaviour of abort() in ROCm. This behaviour being to terminate the entire application rather than just the kernel in which it is executed. This difference is also now documented here C++ Language Extensions. Thank you!