HIP icon indicating copy to clipboard operation
HIP copied to clipboard

HIP device printf is broken in ROCM 5.4.2 (internal error)

Open dmikushin opened this issue 2 years ago • 8 comments

The use of printf inside a global function in HIP results into an internal error:

> cat rocgdb_test.hip 
#include <hip/hip_runtime.h>

__global__ void kernel(int i)
{
	printf("Test me\n");
}

int main(int argc, char* argv[])
{
	kernel<<<dim3(1), dim3(1)>>>(argc);
	hipDeviceSynchronize();
	return 0;
}

Compiled with our without debugging enabled:

> /opt/rocm/hip/bin/hipcc --version
HIP version: 5.4.22803-474e8620
AMD clang version 15.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.4.2 22506 5cddd31fe35b165caadc4409b1d79d0d377c89be)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.4.2/llvm/bin

> /opt/rocm/hip/bin/hipcc -ggdb -fstandalone-debug -Xclang -O0 -Xclang -gcodeview rocgdb_test.hip -o rocgdb_test

Running the executable gives the following error:

> ./rocgdb_test
Hostcall: invalid service request.
fish: “./rocgdb_test” terminated by signal SIGABRT (Abort)

Running the executable in a debugger gives an internal error backtrace:

> rocgdb ./rocgdb_test
GNU gdb (rocm-rel-5.4-104) 12.1
...
Reading symbols from ./rocgdb_test...
(gdb) r
Starting program: /home/marcusmae/amd/rocgdb_test/rocgdb_test 
warning: AMD GPU driver's version 1.1 not supported (version must be >= 1.6 and < 2.0)
warning: amd-dbgapi: unable to enable GPU debugging due to a restriction error
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5d1d700 (LWP 476976)]
[New Thread 0x7ffff5519700 (LWP 476977)]
[Thread 0x7ffff5519700 (LWP 476977) exited]
[New Thread 0x7ffeecbff700 (LWP 476978)]
[Thread 0x7ffeecbff700 (LWP 476978) exited]
[New Thread 0x7ffff4c57700 (LWP 476979)]
Hostcall: invalid service request.
Couldn't read debug register: No such process.
(gdb) [Thread 0x7ffff4c57700 (LWP 476979) exited]
[Thread 0x7ffff5d1d700 (LWP 476976) exited]
[Inferior 1 (process 476972) exited normally]

(gdb) b exit
Breakpoint 1 at 0x7ffff607ba40: file exit.c, line 138.
(gdb) r
Starting program: /home/marcusmae/amd/rocgdb_test/rocgdb_test 
warning: AMD GPU driver's version 1.1 not supported (version must be >= 1.6 and < 2.0)
warning: amd-dbgapi: unable to enable GPU debugging due to a restriction error
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff5d1d700 (LWP 476982)]
[New Thread 0x7ffff5519700 (LWP 476983)]
[Thread 0x7ffff5519700 (LWP 476983) exited]
[New Thread 0x7ffedffff700 (LWP 476984)]
[Thread 0x7ffedffff700 (LWP 476984) exited]
[New Thread 0x7ffff4c57700 (LWP 476985)]
Hostcall: invalid service request.

Thread 5 "rocgdb_test" received signal SIGABRT, Aborted.
[Switching to thread 5 (Thread 0x7ffff4c57700 (LWP 476985))]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) up
#1  0x00007ffff6057859 in __GI_abort () at abort.c:79
79	abort.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff6057859 in __GI_abort () at abort.c:79
#2  0x00007ffff678656d in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#3  0x00007ffff67c3e48 in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#4  0x00007ffff67c41bc in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#5  0x00007ffff64c676b in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#6  0x00007ffff67705b7 in ?? () from /opt/rocm-5.4.2/lib/libamdhip64.so.5
#7  0x00007ffff7f7c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8  0x00007ffff6154133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

dmikushin avatar Feb 13 '23 16:02 dmikushin

Also occurs with ROCm 5.2.5, 5.3.3, 5.4.3. Conclusion: problem is in the driver, not in the toolchain.

Also occurs with AMDGPU 5.18.13-1538762.20.04, 5.18.13.50400-1510348.20.04

dmikushin avatar Feb 16 '23 14:02 dmikushin

Thank you for the report.

I can't seem to reproduce the issue, can you run the exe with 'AMD_LOG_LEVEL=7 ./rocgdb_test' and attach the logs.

Also if you can provide you system details i.e. GPU + any flags that you are using, that will be helpful

jatinx avatar Feb 17 '23 10:02 jatinx

Hi @jatinx , here is the log:

> AMD_LOG_LEVEL=7 ./rocgdb_test
:3:rocdevice.cpp            :426 : 75799966442 us: 202289: [tid:0x7f674bedd880] Initializing HSA stack.
:3:comgrctx.cpp             :33  : 75800006566 us: 202289: [tid:0x7f674bedd880] Loading COMGR library.
:3:rocdevice.cpp            :205 : 75800010993 us: 202289: [tid:0x7f674bedd880] Numa selects cpu agent[0]=0x1204260(fine=0x11fe630,coarse=0x1202800) for gpu agent=0x1202c40
:3:rocdevice.cpp            :1633: 75800011251 us: 202289: [tid:0x7f674bedd880] HMM support: 0, xnack: 0, direct host access: 0

:4:rocdevice.cpp            :1942: 75800011303 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f674e170000, size 0x28
:4:rocdevice.cpp            :1942: 75800011559 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f664ac00000, size 0x101000
:4:rocdevice.cpp            :1942: 75800011799 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f664aa00000, size 0x101000
:4:runtime.cpp              :83  : 75800011914 us: 202289: [tid:0x7f674bedd880] init
:3:hip_context.cpp          :50  : 75800011921 us: 202289: [tid:0x7f674bedd880] Direct Dispatch: 1
:3:hip_platform.cpp         :198 : 75800011950 us: 202289: [tid:0x7f674bedd880] __hipPushCallConfiguration: Returned hipSuccess : 
:3:hip_platform.cpp         :203 : 75800011960 us: 202289: [tid:0x7f674bedd880]  __hipPopCallConfiguration ( {0,0,0}, {2206784,0,1414757600}, 0x7ffd54537b68, 0x7ffd54537b60 ) 
:3:hip_platform.cpp         :212 : 75800011965 us: 202289: [tid:0x7f674bedd880] __hipPopCallConfiguration: Returned hipSuccess : 
:3:hip_module.cpp           :469 : 75800011973 us: 202289: [tid:0x7f674bedd880]  hipLaunchKernel ( 0x200738, {1,1,1}, {1,1,1}, 0x7ffd54537b30, 0, stream:<null> ) 
:3:devprogram.cpp           :2676: 75800012245 us: 202289: [tid:0x7f674bedd880] Using Code Object V4.
:3:devprogram.cpp           :2979: 75800015852 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: _Z6kerneli
:3:rocdevice.cpp            :2703: 75800015885 us: 202289: [tid:0x7f674bedd880] number of allocated hardware queues with low priority: 0, with normal priority: 0, with high priority: 0, maximum per priority is: 4
:3:rocdevice.cpp            :2777: 75800019255 us: 202289: [tid:0x7f674bedd880] created hardware queue 0x7f674b691000 with size 16384 with priority 1, cooperative: 0
:4:rocdevice.cpp            :1942: 75800019416 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f6649200000, size 0x100000
:3:devprogram.cpp           :2676: 75800215190 us: 202289: [tid:0x7f674bedd880] Using Code Object V4.
:3:devprogram.cpp           :2979: 75800216051 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWrite
:3:devprogram.cpp           :2979: 75800216058 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_fillImage
:3:devprogram.cpp           :2979: 75800216060 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned2D
:3:devprogram.cpp           :2979: 75800216063 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyImageToBuffer
:3:devprogram.cpp           :2979: 75800216065 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyImage1DA
:3:devprogram.cpp           :2979: 75800216067 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_fillBufferAligned
:3:devprogram.cpp           :2979: 75800216073 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyImage
:3:devprogram.cpp           :2979: 75800216076 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_streamOpsWait
:3:devprogram.cpp           :2979: 75800216079 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRectAligned
:3:devprogram.cpp           :2979: 75800216082 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferRect
:3:devprogram.cpp           :2979: 75800216085 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferAligned
:3:devprogram.cpp           :2979: 75800216090 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBufferToImage
:3:devprogram.cpp           :2979: 75800216093 us: 202289: [tid:0x7f674bedd880] For Init/Fini: Kernel Name: __amd_rocclr_copyBuffer
:4:command.cpp              :349 : 75800216232 us: 202289: [tid:0x7f674bedd880] Command (KernelExecution) enqueued: 0x1705a60
:3:rocvirtual.cpp           :778 : 75800216239 us: 202289: [tid:0x7f674bedd880] Arg0:   = val:1
:3:rocvirtual.cpp           :2774: 75800216242 us: 202289: [tid:0x7f674bedd880] ShaderName : _Z6kerneli
:4:rocdevice.cpp            :1942: 75800217288 us: 202289: [tid:0x7f674bedd880] Allocate hsa host memory 0x7f6648800000, size 0x8cd238
:3:rocdevice.cpp            :2917: 75800217611 us: 202289: [tid:0x7f674bedd880] Created hostcall buffer 0x7f6648800000 for hardware queue 0x7f674b691000
:3:os_posix.cpp             :390 : 75800217706 us: 202289: [tid:0x7f674bedd880] Resetting CPU core affinities
:3:devhostcall.cpp          :405 : 75800217770 us: 202289: [tid:0x7f674bedd880] Launched hostcall listener at 0x1237c50
:3:devhostcall.cpp          :418 : 75800217779 us: 202289: [tid:0x7f674bedd880] Registered hostcall buffer 0x7f6648800000 with listener 0x1237c50
:4:rocvirtual.cpp           :862 : 75800217792 us: 202289: [tid:0x7f674bedd880] HWq=0x7f674b691000, Dispatch Header = 0xb02 (type=2, barrier=1, acquire=1, release=1), setup=3, grid=[1, 1, 1], workgroup=[1, 1, 1], private_seg_size=1144, group_seg_size=0, kernel_obj=0x7f674b680680, kernarg_address=0x7f6649200000, completion_signal=0x0
:3:hip_module.cpp           :470 : 75800217802 us: 202289: [tid:0x7f674bedd880] hipLaunchKernel: Returned hipSuccess : 
:3:hip_device_runtime.cpp   :502 : 75800217808 us: 202289: [tid:0x7f674bedd880]  hipDeviceSynchronize (  ) 
:3:rocdevice.cpp            :2645: 75800217815 us: 202289: [tid:0x7f674bedd880] No HW event
:4:commandqueue.cpp         :135 : 75800217818 us: 202289: [tid:0x7f674bedd880] HW Event not ready, awaiting completion instead
:4:command.cpp              :289 : 75800217825 us: 202289: [tid:0x7f674bedd880] Queue marker to command queue: 0x11fa190
:4:command.cpp              :349 : 75800217828 us: 202289: [tid:0x7f674bedd880] Command (InternalMarker) enqueued: 0x1809ce0
:4:rocvirtual.cpp           :1005: 75800217834 us: 202289: [tid:0x7f674bedd880] HWq=0x7f674b691000, BarrierAND Header = 0x1503 (type=3, barrier=1, acquire=2, release=2), dep_signal=[0x0, 0x0, 0x0, 0x0, 0x0], completion_signal=0x7f674e181800
:4:rocvirtual.cpp           :550 : 75800217837 us: 202289: [tid:0x7f674bedd880] Host wait on completion_signal=0x7f674e181800
:3:rocvirtual.hpp           :61  : 75800217840 us: 202289: [tid:0x7f674bedd880] Host active wait for Signal = (0x7f674e181800) for -1 ns
:1:devhcmessages.cpp        :156 : 75800218847 us: 202289: [tid:0x7f664adfe700] Hostcall: No message found
:1:devhostcall.cpp          :85  : 75800218858 us: 202289: [tid:0x7f664adfe700] Hostcall: invalid request for service "2".
Hostcall: invalid service request.
fish: “AMD_LOG_LEVEL=7 ./rocgdb_test” terminated by signal SIGABRT (Abort)

dmikushin avatar Feb 17 '23 12:02 dmikushin

Ubuntu 20.04, RX Vega 56. I think I caught a buggy vbios with one of the recent amdgpu-dkms updates... Now this vbios persists in the GPU, even if I rollback the driver.

dmikushin avatar Feb 17 '23 12:02 dmikushin

On bringing this to the team, they told me that printf on HIP will only work on system with PCI Gen3 Atomics supported.

jatinx avatar Feb 27 '23 22:02 jatinx

Hi @jatinx , thank you for this information. Note it worked for me on the same system until very recently. Is this PCI Gen3 Atomics a new recently introduced requirement?

dmikushin avatar Feb 27 '23 23:02 dmikushin

@jatinx Adding a note about PCI Gen3 Atomics to printf documentation would be helpful. It states very matter-of-factly that printf works. Similar to #2266, printf hangs on Windows on gfx1032 with an old AMD Kaveri APU.

bdenhollander avatar Aug 22 '23 11:08 bdenhollander

HIP 5.7.0 includes a fallback to OpenCL-based printf to allow use without PCIe atomics. https://rocm.docs.amd.com/en/docs-5.7.0/release.html#non-hostcall-hip-printf

bdenhollander avatar Sep 16 '23 13:09 bdenhollander