ROCgdb icon indicating copy to clipboard operation
ROCgdb copied to clipboard

[Issue]: Breakpoints in gpu kernels require manually interrupting the process to work

Open krzysiek4321 opened this issue 6 months ago • 1 comments

Problem Description

When I take the basic hello-world example from hip examples, build it with -O0 -g, open it in rocgdb - breakpoints in gpu kernel executed code require manually interrupting the process to trigger and continue execution of the program after the wave finishes running.

Operating System

Ubuntu Toolbox 24.04 running on Fedora Atomic (6.14.9-300.fc42.x86_64)

CPU

AMD Ryzen 5950X

GPU

AMD Radeon RX 6700XT

ROCm Version

ROCm 6.4.1.60401-83~24.04

ROCm Component

ROCgdb

Steps to Reproduce

  1. Create new toolbox using ubuntu 24.04 distro toolbox create --distro ubuntu --release 24.04 rocm_devspace
  2. Install ROCm according to rocm docs I don't need to add my user to video and render groups because /dev/kfd and /dev/dri/renderD have o+rw set so I have access inside the container. Even if I add o+rw to /dev/dri/card nothing changes.
  3. Download HIP-Basic/hello_world/main.hip from rocm examples and compile it with hipcc -O0 -g -o main main.hip
  4. rocgdb ./main
  5. break helloworld_kernel
  6. run
  7. CTRL+c or send SIGINT to the running ./main with kill
  8. continue

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

ROCk module is loaded

HSA System Attributes

Runtime Version: 1.15 Runtime Ext Version: 1.7 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES

========== HSA Agents


Agent 1


Name: AMD Ryzen 9 5950X 16-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 9 5950X 16-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5086 BDFID: 0 Internal Node ID: 0 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 131808928(0x7db3ea0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 131808928(0x7db3ea0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 131808928(0x7db3ea0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 131808928(0x7db3ea0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:


Agent 2


Name: gfx1031 Uuid: GPU-XX Marketing Name: AMD Radeon RX 6700 XT Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 3072(0xc00) KB L3: 98304(0x18000) KB Chip ID: 29663(0x73df) ASIC Revision: 0(0x0) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2855 BDFID: 2816 Internal Node ID: 1 Compute Unit: 40 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 130 SDMA engine uCode:: 80 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 12566528(0xbfc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 12566528(0xbfc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1031 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx10-3-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done ***

Additional Information

Part of stacktrace from rocgdb when manually interrupting the process.
#3  rocr::core::InterruptSignal::WaitRelaxed (this=0xa46480, condition=HSA_SIGNAL_CONDITION_LT, compare_value=1, timeout=<optimized out>, wait_hint=HSA_WAIT_STATE_ACTIVE)
    at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/interrupt_signal.cpp:166
#4  0x00007ffff5e7b09e in rocr::core::InterruptSignal::WaitAcquire (this=<optimized out>, condition=<optimized out>, compare_value=<optimized out>, timeout=<optimized out>, wait_hint=<optimized out>)
    at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/interrupt_signal.cpp:205
#5  0x00007ffff5e6f2e1 in rocr::HSA::hsa_signal_wait_scacquire (hsa_signal=..., condition=HSA_SIGNAL_CONDITION_LT, compare_value=1, timeout_hint=18446744073709551615, wait_state_hint=HSA_WAIT_STATE_ACTIVE)
    at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/hsa.cpp:1249
#6  0x00007ffff6b58ebb in amd::roc::WaitForSignal (active_wait=<optimized out>, signal=...)
    at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/clr/rocclr/device/rocm/rocvirtual.hpp:60
#7  amd::roc::Device::IsHwEventReady (this=<optimized out>, event=..., wait=true, hip_event_flags=<optimized out>)
    at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/clr/rocclr/device/rocm/rocdevice.cpp:2955
#8  0x00007ffff6b45037 in amd::HostQueue::finish (this=this@entry=0x317ed0, cpu_wait=<optimized out>, cpu_wait@entry=false)
    at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/clr/rocclr/platform/commandqueue.cpp:165
#9  0x00007ffff6839066 in hip::Device::SyncAllStreams (this=0x2f40d0, cpu_wait=false, wait_blocking_streams_only=<optimized out>)
    at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/clr/hipamd/src/hip_device.cpp:288
#10 0x00007ffff6818b01 in hip::hipDeviceSynchronize () at /longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/clr/hipamd/src/hip_device_runtime.cpp:644
#11 0x0000000000201f29 in main () at main.hip:54

krzysiek4321 avatar Jun 05 '25 23:06 krzysiek4321

Hi @krzysiek4321. Internal ticket has been created to investigate this issue. Thanks!

ppanchad-amd avatar Jun 06 '25 13:06 ppanchad-amd