HIP
HIP copied to clipboard
HIP device printf is broken in ROCM 5.4.2 (internal error)
The use of printf
inside a global function in HIP results into an internal error:
> cat rocgdb_test.hip
#include <hip/hip_runtime.h>
__global__ void kernel(int i)
{
printf("Test me\n");
}
int main(int argc, char* argv[])
{
kernel<<<dim3(1), dim3(1)>>>(argc);
hipDeviceSynchronize();
return 0;
}
Compiled with our without debugging enabled:
> /opt/rocm/hip/bin/hipcc --version
HIP version: 5.4.22803-474e8620
AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.4.2 22506 5cddd31fe35b165caadc4409b1d79d0d377c89be)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.4.2/llvm/bin
> /opt/rocm/hip/bin/hipcc -ggdb -fstandalone-debug -Xclang -O0 -Xclang -gcodeview rocgdb_test.hip -o rocgdb_test
Running the executable gives the following error:
> ./rocgdb_test
Hostcall: invalid service request.
fish: “./rocgdb_test” terminated by signal SIGABRT (Abort)
Running the executable in a debugger gives an internal error backtrace:
> rocgdb ./rocgdb_test
GNU gdb (rocm-rel-5.4-104) 12.1
...
Reading symbols from ./rocgdb_test...
(gdb) r
Starting program: /home/marcusmae/amd/rocgdb_test/rocgdb_test
warning: AMD GPU driver's version 1.1 not supported (version must be >= 1.6 and < 2.0)
warning: amd-dbgapi: unable to enable GPU debugging due to a restriction error
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5d1d700 (LWP 476976)]
[New Thread 0x7ffff5519700 (LWP 476977)]
[Thread 0x7ffff5519700 (LWP 476977) exited]
[New Thread 0x7ffeecbff700 (LWP 476978)]
[Thread 0x7ffeecbff700 (LWP 476978) exited]
[New Thread 0x7ffff4c57700 (LWP 476979)]
Hostcall: invalid service request.
Couldn't read debug register: No such process.
(gdb) [Thread 0x7ffff4c57700 (LWP 476979) exited]
[Thread 0x7ffff5d1d700 (LWP 476976) exited]
[Inferior 1 (process 476972) exited normally]
(gdb) b exit
Breakpoint 1 at 0x7ffff607ba40: file exit.c, line 138.
(gdb) r
Starting program: /home/marcusmae/amd/rocgdb_test/rocgdb_test
warning: AMD GPU driver's version 1.1 not supported (version must be >= 1.6 and < 2.0)
warning: amd-dbgapi: unable to enable GPU debugging due to a restriction error
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5d1d700 (LWP 476982)]
[New Thread 0x7ffff5519700 (LWP 476983)]
[Thread 0x7ffff5519700 (LWP 476983) exited]
[New Thread 0x7ffedffff700 (LWP 476984)]
[Thread 0x7ffedffff700 (LWP 476984) exited]
[New Thread 0x7ffff4c57700 (LWP 476985)]
Hostcall: invalid service request.
Thread 5 "rocgdb_test" received signal SIGABRT, Aborted.
[Switching to thread 5 (Thread 0x7ffff4c57700 (LWP 476985))]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) up
#1 0x00007ffff6057859 in __GI_abort () at abort.c:79
79 abort.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff6057859 in __GI_abort () at abort.c:79
#2 0x00007ffff678656d in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#3 0x00007ffff67c3e48 in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#4 0x00007ffff67c41bc in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#5 0x00007ffff64c676b in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#6 0x00007ffff67705b7 in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#7 0x00007ffff7f7c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007ffff6154133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Also occurs with ROCm 5.2.5, 5.3.3, 5.4.3. Conclusion: problem is in the driver, not in the toolchain.
Also occurs with AMDGPU 5.18.13-1538762.20.04, 5.18.13.50400-1510348.20.04
Thank you for the report.
I can't seem to reproduce the issue, can you run the exe with 'AMD_LOG_LEVEL=7 ./rocgdb_test' and attach the logs.
Also if you can provide you system details i.e. GPU + any flags that you are using, that will be helpful
Hi @jatinx , here is the log:
> AMD_LOG_LEVEL=7 ./rocgdb_test
:3:rocdevice.cpp :426 : 75799966442 us: 202289: [tid:0x7f674bedd880] Initializing HSA stack.
:3:comgrctx.cpp :33 : 75800006566 us: 202289: [tid:0x7f674bedd880] Loading COMGR library.
:3:rocdevice.cpp :205 : 75800010993 us: 202289: [tid:0x7f674bedd880] Numa selects cpu agent[0]=0x1204260(fine=0x11fe630,coarse=0x1202800) for gpu agent=0x1202c40
:3:rocdevice.cpp :1633: 75800011251 us: 202289: [tid:0x7f674bedd880] HMM support: 0, xnack: 0, direct host access: 0
:4:rocdevice.cpp :1942: 75800011303 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f674e170000, size 0x28
:4:rocdevice.cpp :1942: 75800011559 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f664ac00000, size 0x101000
:4:rocdevice.cpp :1942: 75800011799 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f664aa00000, size 0x101000
:4:runtime.cpp :83 : 75800011914 us: 202289: [tid:0x7f674bedd880] init
:3:hip_context.cpp :50 : 75800011921 us: 202289: [tid:0x7f674bedd880] Direct Dispatch: 1
:3:hip_platform.cpp :198 : 75800011950 us: 202289: [tid:0x7f674bedd880] __hipPushCallConfiguration: Returned hipSuccess :
:3:hip_platform.cpp :203 : 75800011960 us: 202289: [tid:0x7f674bedd880] __hipPopCallConfiguration ( {0,0,0}, {2206784,0,1414757600}, 0x7ffd54537b68, 0x7ffd54537b60 )
:3:hip_platform.cpp :212 : 75800011965 us: 202289: [tid:0x7f674bedd880] __hipPopCallConfiguration: Returned hipSuccess :
:3:hip_module.cpp :469 : 75800011973 us: 202289: [tid:0x7f674bedd880] hipLaunchKernel ( 0x200738, {1,1,1}, {1,1,1}, 0x7ffd54537b30, 0, stream:<null> )
:3:devprogram.cpp :2676: 75800012245 us: 202289: [tid:0x7f674bedd880] Using Code Object V4.
:3:devprogram.cpp :2979: 75800015852 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: _Z6kerneli
:3:rocdevice.cpp :2703: 75800015885 us: 202289: [tid:0x7f674bedd880] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:3:rocdevice.cpp :2777: 75800019255 us: 202289: [tid:0x7f674bedd880] created hardware queue 0x7f674b691000 with size 16384 with priority 1, cooperative: 0
:4:rocdevice.cpp :1942: 75800019416 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f6649200000, size 0x100000
:3:devprogram.cpp :2676: 75800215190 us: 202289: [tid:0x7f674bedd880] Using Code Object V4.
:3:devprogram.cpp :2979: 75800216051 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWrite
:3:devprogram.cpp :2979: 75800216058 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_fillImage
:3:devprogram.cpp :2979: 75800216060 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned2D
:3:devprogram.cpp :2979: 75800216063 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyImageToBuffer
:3:devprogram.cpp :2979: 75800216065 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyImage1DA
:3:devprogram.cpp :2979: 75800216067 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned
:3:devprogram.cpp :2979: 75800216073 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyImage
:3:devprogram.cpp :2979: 75800216076 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWait
:3:devprogram.cpp :2979: 75800216079 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRectAligned
:3:devprogram.cpp :2979: 75800216082 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRect
:3:devprogram.cpp :2979: 75800216085 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferAligned
:3:devprogram.cpp :2979: 75800216090 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferToImage
:3:devprogram.cpp :2979: 75800216093 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBuffer
:4:command.cpp :349 : 75800216232 us: 202289: [tid:0x7f674bedd880] Command (KernelExecution) enqueued: 0x1705a60
:3:rocvirtual.cpp :778 : 75800216239 us: 202289: [tid:0x7f674bedd880] Arg0: = val:1
:3:rocvirtual.cpp :2774: 75800216242 us: 202289: [tid:0x7f674bedd880] ShaderName : _Z6kerneli
:4:rocdevice.cpp :1942: 75800217288 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f6648800000, size 0x8cd238
:3:rocdevice.cpp :2917: 75800217611 us: 202289: [tid:0x7f674bedd880] Created hostcall buffer 0x7f6648800000 for hardware queue 0x7f674b691000
:3:os_posix.cpp :390 : 75800217706 us: 202289: [tid:0x7f674bedd880] Resetting CPU core affinities
:3:devhostcall.cpp :405 : 75800217770 us: 202289: [tid:0x7f674bedd880] Launched hostcall listener at 0x1237c50
:3:devhostcall.cpp :418 : 75800217779 us: 202289: [tid:0x7f674bedd880] Registered hostcall buffer 0x7f6648800000 with listener 0x1237c50
:4:rocvirtual.cpp :862 : 75800217792 us: 202289: [tid:0x7f674bedd880] HWq=0x7f674b691000, Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=3, grid=[1, 1, 1], workgroup=[1, 1, 1], private_seg_size=1144, group_seg_size=0, kernel_obj=0x7f674b680680, kernarg_address=0x7f6649200000, completion_signal=0x0
:3:hip_module.cpp :470 : 75800217802 us: 202289: [tid:0x7f674bedd880] hipLaunchKernel: Returned hipSuccess :
:3:hip_device_runtime.cpp :502 : 75800217808 us: 202289: [tid:0x7f674bedd880] hipDeviceSynchronize ( )
:3:rocdevice.cpp :2645: 75800217815 us: 202289: [tid:0x7f674bedd880] No HW event
:4:commandqueue.cpp :135 : 75800217818 us: 202289: [tid:0x7f674bedd880] HW Event not ready, awaiting completion instead
:4:command.cpp :289 : 75800217825 us: 202289: [tid:0x7f674bedd880] Queue marker to command queue: 0x11fa190
:4:command.cpp :349 : 75800217828 us: 202289: [tid:0x7f674bedd880] Command (InternalMarker) enqueued: 0x1809ce0
:4:rocvirtual.cpp :1005: 75800217834 us: 202289: [tid:0x7f674bedd880] HWq=0x7f674b691000, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7f674e181800
:4:rocvirtual.cpp :550 : 75800217837 us: 202289: [tid:0x7f674bedd880] Host wait on completion_signal=0x7f674e181800
:3:rocvirtual.hpp :61 : 75800217840 us: 202289: [tid:0x7f674bedd880] Host active wait for Signal = (0x7f674e181800) for -1 ns
:1:devhcmessages.cpp :156 : 75800218847 us: 202289: [tid:0x7f664adfe700] Hostcall: No message found
:1:devhostcall.cpp :85 : 75800218858 us: 202289: [tid:0x7f664adfe700] Hostcall: invalid request for service "2".
Hostcall: invalid service request.
fish: “AMD_LOG_LEVEL=7 ./rocgdb_test” terminated by signal SIGABRT (Abort)
Ubuntu 20.04, RX Vega 56. I think I caught a buggy vbios with one of the recent amdgpu-dkms updates... Now this vbios persists in the GPU, even if I rollback the driver.
On bringing this to the team, they told me that printf on HIP will only work on system with PCI Gen3 Atomics supported.
Hi @jatinx , thank you for this information. Note it worked for me on the same system until very recently. Is this PCI Gen3 Atomics a new recently introduced requirement?
@jatinx Adding a note about PCI Gen3 Atomics to printf
documentation would be helpful. It states very matter-of-factly that printf works. Similar to #2266, printf
hangs on Windows on gfx1032 with an old AMD Kaveri APU.
HIP 5.7.0 includes a fallback to OpenCL-based printf to allow use without PCIe atomics. https://rocm.docs.amd.com/en/docs-5.7.0/release.html#non-hostcall-hip-printf