ROCgdb
ROCgdb copied to clipboard
fatal error: KFD_IOC_DBG_TRAP_GET_VERSION failed && How to set a breakpoint into the kernel function?
GPU is Vega 20. ROCm is 5.1.0. GNU gdb (rocm-rel-5.1-36) 11.2.
rocm-dbgapi was installed.
> apt search rocm-dbgapi
> Sorting... Done
> Full Text Search... Done
> rocm-dbgapi/Ubuntu,now 0.64.0.50100-36 amd64 [installed]
> Library to provide AMD GPU debugger API
>
> rocm-dbgapi5.1.0/Ubuntu 0.64.0.50100-36 amd64
> Library to provide AMD GPU debugger API
Compile:
CXXFLAGS =-g -O0 -ggdb
Run:
rocgdb ./MatrixTranspose
I'm getting an error message during execution:
> (gdb) set debug amdgpu log-level verbose
> amd-dbgapi: amd_dbgapi_set_log_level (LOG_LEVEL_VERBOSE) {
> amd-dbgapi: } = void
> (gdb) run
> Starting program: /home/xyy/test_rocgdb/0_MatrixTranspose/MatrixTranspose
> amd-dbgapi: amd_dbgapi_process_attach (client_process_id=0x557aa4e8f5e0, process_id=0x557aa5109078) {
> amd-dbgapi: callback: get_os_pid (pid=0x7fffbfc5aaec) {
> amd-dbgapi: callback: } = STATUS_SUCCESS, *pid=1495950
> amd-dbgapi: attaching process_1 to OS process 1495950
> amd-dbgapi: detached process_1
> amd-dbgapi: linux_driver_t statistics (pid 1495950): 0 reads (0), 0 writes (0)
> amd-dbgapi: fatal error: KFD_IOC_DBG_TRAP_GET_VERSION failed
> Backtrace:
> ……
> amd-dbgapi: } = STATUS_FATAL
> Could not attach to process 1495950 (rc=-2)
If you ignore the above issues, you can still set breakpoints on host function except for kernel function. I found that breakpoints cannot be set inside the kernel function. Just like:
> (gdb) b main
> Breakpoint 1 at 0x216262: file MatrixTranspose.cpp, line 37.
> (gdb) b MatrixTranspose.cpp:12
> No compiled code for line 12 in file "MatrixTranspose.cpp".
> Make breakpoint pending on future shared library load? (y or [n]) y
> Breakpoint 2 (MatrixTranspose.cpp:12) pending.
> (gdb) i b
> Num Type Disp Enb Address What
> 1 breakpoint keep y 0x0000000000216262 in main() at MatrixTranspose.cpp:37
> 2 breakpoint keep y <PENDING> MatrixTranspose.cpp:12
I try to use amdgpu-install to install both legacy and rocr opencl, but it doesn't work. And I get a new error message:
> WARNING: amdgpu dkms failed for running kernel
I want to set a breakpoint on the kernel function and print the information, how do I do it? Thanks~
@yuanyuanxia Apologies for the lack of response. Do you still need assistance with this ticket? Thanks!