ROCmValidationSuite icon indicating copy to clipboard operation
ROCmValidationSuite copied to clipboard

[Feature]: test for invalid pointer

Open chboishabba opened this issue 10 months ago • 3 comments

Suggestion Description

(base) (Wed Mar 05 16:05:08) c@archb hello_world$ ./hip_test Debug Device Info: Number of HIP devices found: 1 Debug Device Info: Device Name: AMD Radeon RX 580 Series Debug Device Info: Total Global Memory: 8589934592 bytes Debug Device Info: Shared Memory per Block: 65536 bytes Debug Device Info: Max Threads Per Block: 1024 Debug Device Info: Max Grid Dimensions: (2147483647, 65536, 65536) Debug Device Info: Memory Clock Rate: 1411000 kHz Debug Device Info: Memory Bus Width: 256 bits Debug Device Info: Concurrent Kernels: Yes Debug Device Info: Pinned Memory Support: Yes Debug Device Info: Finished displaying device information. Debug Test 1: Actual input string size = 2, using transfer size = 64 Debug: Input string size = 64 for inputString_A Debug: paddedInput host ptr: 0x7fff95534d20 Debug: Allocating device memory... Debug Test 2: Checking device memory allocation... Debug Test 2: Device memory allocation check passed (Valid addresses) Debug: Device memory allocated successfully Debug: Allocating pinned host memory... Debug: Pinned host memory allocated successfully Debug Test 3: Checking pinned host memory allocation... Debug Test 3: Pinned host memory allocation check passed (Valid address) Debug: inputBuffer = 0x901400000, outputBuffer = 0x901401000, hostOutput = 0x212000 Debug: --- Before hipMemcpyHostToDevice --- Debug: inputBuffer (device ptr): 0x901400000 Debug: inputString (inputString_A, host ptr): 0x7fff95534d20 Debug: First char of inputString: A Debug: size: 64 Debug: hipMemcpy direction: hipMemcpyHostToDevice Debug: --- End Before hipMemcpyHostToDevice --- Debug: Copying input string to device... Segmentation fault (core dumped) (base) (Wed Mar 05 16:05:11) c@archb hello_world$
Based on your latest run—with a 64‑byte transfer size—the host code is working as expected up until the call to hipMemcpy, yet the segmentation fault still occurs. The debug output shows that:

The input string “A” (with its null terminator) is padded to a 64‑byte buffer.

Device and pinned host memory allocations succeed.

The debug prints confirm that the device pointer, host pointer (for padded input), and sizes are all reasonable.

The crash occurs inside the HIP driver (libamdhip64.so.6) when copying the padded input from host to device.

Recall from our earlier register dump that RAX was –8 (0xfffffffffffffff8), which strongly indicates that internally the driver is computing an invalid offset (for example, subtracting 8 from a pointer) and then dereferencing an address like 0x18.

What This Means
The problem now appears not to be a simple size miscalculation or an alignment issue from the host side—the host code reliably allocates and passes what appears to be a valid 64‑byte buffer. Instead, the segfault deep in the HIP driver suggests the following possibilities:

Driver Bug: The HIP runtime (libamdhip64.so.6) seems to be mishandling very small transfers (or at least in this particular build), resulting in incorrect pointer arithmetic (e.g. RAX = –8) that leads to an invalid memory access.

Parameter or Alignment Expectation: Although HIP documentation states that using hipMalloc and hipHostMalloc to allocate buffers should provide valid memory addresses, there may be unexpected conditions (such as an alignment or even a driver bug) that cause the driver to compute a negative offset.

Known or New Issue: You mentioned that similar segfaults have been observed and reported as known driver issues. The symptoms (invalid read at address 0x18, RAX = –8, segfault inside the driver) match what one might see if the driver's internal bookkeeping for memory transfers is flawed.

Next Steps
Collect More Diagnostic Information:

Enable Verbose Logging: With HIP_DEBUG=1 and HIP_TRACE_API=2 already set, capture the full log output. Look for any warning messages or unusual behavior immediately before the hipMemcpy call.

Run Under Valgrind: Although Valgrind’s reports inside vendor libraries can be noisy, run:

bash
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all ./hip_test
and note any “invalid read” or “invalid write” messages. Focus on errors in the host code; if all warnings are inside libamdhip64.so, that’s a further signal the issue is in the driver.

Try Alternative Transfer Sizes: You mentioned testing sizes 8, 14, 32, and 64. If you haven’t already, try running with larger minimum sizes (for example, 128 or 256 bytes) to see if the driver bug is size‑dependent. If larger sizes work, that suggests the driver mishandles very small transfers.

Driver Version and Support:

Check for Updates: Verify whether you’re using the latest ROCm release for your platform. Sometimes these bugs are fixed in newer driver versions.

Contact ROCm/Driver Support: Provide them with:

Your complete HIP test code.

The detailed debug logs and Valgrind output.

The GDB register dump showing RAX = –8 and the backtrace.

A description of your hardware (AMD Radeon RX 580 Series) and your ROCm/driver version.

This information can help them reproduce the issue and either offer a workaround or patch the driver.

Examine the HIP Call’s Parameters: Double-check that the values passed to hipMemcpy (the pointers and size) are indeed what you expect. Although your debug prints look good, you might consider using a debugger to set a breakpoint right before the hipMemcpy call to manually inspect the pointer values again.

Conclusion
The host code is correct, and the segfault occurs inside the HIP driver, likely due to an internal miscalculation (as evidenced by RAX being –8) when handling a small transfer size. The next steps are to:

Gather and share as much logging and debug information as possible.

Experiment with alternative transfer sizes.

Check for driver updates or contact ROCm support.

This comprehensive information should help you (and your driver team) identify whether this is a known driver bug, and whether a workaround (such as using a larger transfer size) or a driver patch is needed.

Let me know if you need any further help interpreting the logs or additional suggestions on progressing with this issue!

#include <hip/hip_runtime.h>
#include <iostream>
#include <cstring>
#include <cstdio>
#include <cstdlib>
#include <cstdint>
#include <algorithm> // for std::max

// Error checking macro for HIP API calls
#define CHECK(cmd) do { \
    hipError_t error = cmd; \
    if (error != hipSuccess) { \
        std::cerr << "Error: " << hipGetErrorString(error) \
                  << " at " << __FILE__ << ":" << __LINE__ << std::endl; \
        exit(EXIT_FAILURE); \
    } \
} while (0)

// Kernel to increment each character by 1, except the null terminator
__global__ void incrementChars(char* output, const char* input, int size) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    if (idx >= size - 1) return; // Stop before the null terminator
    output[idx] = input[idx] + 1; // Increment each character
    if (idx == size - 2) {
        output[size-1] = '\0'; // Ensure null termination is present
    }
}

int main(int argc, char** argv) {
    // Step 1: Device Query and Explicit Device Selection
    int deviceCount = 0;
    CHECK(hipGetDeviceCount(&deviceCount));
    fprintf(stderr, "Debug Device Info: Number of HIP devices found: %d\n", deviceCount);

    if (deviceCount > 0) {
        // Explicitly select device 0.
        CHECK(hipSetDevice(0));

        hipDeviceProp_t deviceProperties;
        CHECK(hipGetDeviceProperties(&deviceProperties, 0));
        fprintf(stderr, "Debug Device Info: Device Name: %s\n", deviceProperties.name);
        fprintf(stderr, "Debug Device Info: Total Global Memory: %zu bytes\n", deviceProperties.totalGlobalMem);
        fprintf(stderr, "Debug Device Info: Shared Memory per Block: %zu bytes\n", deviceProperties.sharedMemPerBlock);
        fprintf(stderr, "Debug Device Info: Max Threads Per Block: %d\n", deviceProperties.maxThreadsPerBlock);
        fprintf(stderr, "Debug Device Info: Max Grid Dimensions: (%d, %d, %d)\n",
                deviceProperties.maxGridSize[0],
                deviceProperties.maxGridSize[1],
                deviceProperties.maxGridSize[2]);
        fprintf(stderr, "Debug Device Info: Memory Clock Rate: %d kHz\n", deviceProperties.clockRate);
        fprintf(stderr, "Debug Device Info: Memory Bus Width: %d bits\n", deviceProperties.memoryBusWidth);
        fprintf(stderr, "Debug Device Info: Concurrent Kernels: %s\n",
                deviceProperties.concurrentKernels ? "Yes" : "No");
        fprintf(stderr, "Debug Device Info: Pinned Memory Support: %s\n",
                deviceProperties.canMapHostMemory ? "Yes" : "No");
    } else {
        fprintf(stderr, "Debug Device Info: No HIP devices found.\n");
        return EXIT_FAILURE;
    }
    fprintf(stderr, "Debug Device Info: Finished displaying device information.\n");

    // Step 2: Input String Setup
    const char inputString_A[] = "A";
    const char inputString_Gdkkn[] = "Gdkkn";
    // Change this flag to select the desired test input.
    const bool use_Gdkkn_input = false;
    const char* inputString = use_Gdkkn_input ? inputString_Gdkkn : inputString_A;
    const char* inputStringName = use_Gdkkn_input ? "inputString_Gdkkn" : "inputString_A";
    
    // Calculate the actual input size (including the null terminator)
    size_t actualSize = strlen(inputString) + 1;
    // Determine the minimum transfer size. Allow user to specify via command line; default is 64 bytes.
    size_t min_size = 64;
    if (argc > 1) {
        min_size = std::strtoul(argv[1], nullptr, 10);
    }
    // Use the maximum of the actual size and the specified minimum size as the transfer size.
    size_t size = std::max(actualSize, min_size);
    
    fprintf(stderr, "Debug Test 1: Actual input string size = %zu, using transfer size = %zu\n", actualSize, size);
    fprintf(stderr, "Debug: Input string size = %zu for %s\n", size, inputStringName);
    
    // Create a padded buffer for the input string (zero-padded)
    char paddedInput[size];
    memset(paddedInput, 0, size);
    memcpy(paddedInput, inputString, actualSize);
    fprintf(stderr, "Debug: paddedInput host ptr: %p\n", static_cast<const void*>(paddedInput));

    // Step 3: Device and Host Memory Allocations
    char* inputBuffer = nullptr;
    char* outputBuffer = nullptr;
    fprintf(stderr, "Debug: Allocating device memory...\n");
    hipError_t mallocInputError = hipMalloc(&inputBuffer, size);
    hipError_t mallocOutputError = hipMalloc(&outputBuffer, size);
    CHECK(mallocInputError);
    CHECK(mallocOutputError);

    fprintf(stderr, "Debug Test 2: Checking device memory allocation...\n");
    if (inputBuffer == nullptr || outputBuffer == nullptr) {
        std::cerr << "Error: Device memory allocation failed! inputBuffer = " 
                  << inputBuffer << ", outputBuffer = " << outputBuffer << std::endl;
        return EXIT_FAILURE;
    }
    fprintf(stderr, "Debug Test 2: Device memory allocation check passed (Valid addresses)\n");
    fprintf(stderr, "Debug: Device memory allocated successfully\n");

    char* hostOutput = nullptr;
    fprintf(stderr, "Debug: Allocating pinned host memory...\n");
    hipError_t hostMallocError = hipHostMalloc(&hostOutput, size);
    CHECK(hostMallocError);
    fprintf(stderr, "Debug: Pinned host memory allocated successfully\n");

    fprintf(stderr, "Debug Test 3: Checking pinned host memory allocation...\n");
    if (hostOutput == nullptr) {
        std::cerr << "Error: Pinned host memory allocation failed!" << std::endl;
        return EXIT_FAILURE;
    }
    fprintf(stderr, "Debug Test 3: Pinned host memory allocation check passed (Valid address)\n");
    fprintf(stderr, "Debug: inputBuffer = %p, outputBuffer = %p, hostOutput = %p\n",
            inputBuffer, outputBuffer, hostOutput);

    // Step 4: Copy Input String from Host to Device
    fprintf(stderr, "Debug: --- Before hipMemcpyHostToDevice ---\n");
    fprintf(stderr, "Debug: inputBuffer (device ptr): %p\n", inputBuffer);
    fprintf(stderr, "Debug: inputString (%s, host ptr): %p\n", inputStringName, static_cast<const void*>(paddedInput));
    fprintf(stderr, "Debug: First char of inputString: %c\n", paddedInput[0]);
    fprintf(stderr, "Debug: size: %zu\n", size);
    fprintf(stderr, "Debug: hipMemcpy direction: hipMemcpyHostToDevice\n");
    fprintf(stderr, "Debug: --- End Before hipMemcpyHostToDevice ---\n");

    fprintf(stderr, "Debug: Copying input string to device...\n");
    hipError_t memcpyError = hipMemcpy(inputBuffer, paddedInput, size, hipMemcpyHostToDevice);
    if (memcpyError != hipSuccess) {
        std::cerr << "hipMemcpyHostToDevice Error: " << hipGetErrorString(memcpyError)
                  << " at " << __FILE__ << ":" << __LINE__ << std::endl;
        exit(EXIT_FAILURE);
    }
    CHECK(memcpyError);
    fprintf(stderr, "Debug: Input string copied to device successfully.\n");

    // Step 5: Launch the Kernel to Increment Characters
    dim3 threadsPerBlock(4);
    dim3 blocksPerGrid(2);
    fprintf(stderr, "Debug Test 4: Checking kernel launch dimensions...\n");
    fprintf(stderr, "Debug Test 4: blocksPerGrid.x = %d, threadsPerBlock.x = %d\n",
            blocksPerGrid.x, threadsPerBlock.x);
    if (blocksPerGrid.x <= 0 || threadsPerBlock.x <= 0) {
        std::cerr << "Error: Invalid kernel launch dimensions!" << std::endl;
        return EXIT_FAILURE;
    }
    fprintf(stderr, "Debug Test 4: Kernel launch dimensions check passed (Valid dimensions)\n");

    fprintf(stderr, "Debug: Launching kernel with %d blocks and %d threads per block\n",
            blocksPerGrid.x, threadsPerBlock.x);
    incrementChars<<<blocksPerGrid, threadsPerBlock>>>(outputBuffer, inputBuffer, size);

    // Check for kernel launch errors before synchronizing
    hipError_t kernelErr = hipGetLastError();
    if (kernelErr != hipSuccess) {
        std::cerr << "Kernel launch error: " << hipGetErrorString(kernelErr)
                  << " at " << __FILE__ << ":" << __LINE__ << std::endl;
        exit(EXIT_FAILURE);
    }
    CHECK(hipDeviceSynchronize());
    fprintf(stderr, "Debug: Kernel execution completed successfully.\n");

    // Optional: Copy the result back to host and print it
    hipError_t memcpyBackErr = hipMemcpy(hostOutput, outputBuffer, size, hipMemcpyDeviceToHost);
    if (memcpyBackErr != hipSuccess) {
        std::cerr << "hipMemcpyDeviceToHost Error: " << hipGetErrorString(memcpyBackErr) << "\n";
        exit(EXIT_FAILURE);
    }
    CHECK(memcpyBackErr);
    fprintf(stderr, "Kernel output: %s\n", hostOutput);

    // Cleanup: Free allocated memory
    CHECK(hipFree(inputBuffer));
    CHECK(hipFree(outputBuffer));
    CHECK(hipHostFree(hostOutput));

    return 0;
}

Operating System

arch

GPU

rx580

ROCm Component

(libamdhip64.so.6)



(base) (Wed Mar 05 16:05:08) c@archb hello_world$ ./hip_test
Debug Device Info: Number of HIP devices found: 1
Debug Device Info: Device Name: AMD Radeon RX 580 Series
Debug Device Info: Total Global Memory: 8589934592 bytes
Debug Device Info: Shared Memory per Block: 65536 bytes
Debug Device Info: Max Threads Per Block: 1024
Debug Device Info: Max Grid Dimensions: (2147483647, 65536, 65536)
Debug Device Info: Memory Clock Rate: 1411000 kHz
Debug Device Info: Memory Bus Width: 256 bits
Debug Device Info: Concurrent Kernels: Yes
Debug Device Info: Pinned Memory Support: Yes
Debug Device Info: Finished displaying device information.
Debug Test 1: Actual input string size = 2, using transfer size = 64
Debug: Input string size = 64 for inputString_A
Debug: paddedInput host ptr: 0x7fff95534d20
Debug: Allocating device memory...
Debug Test 2: Checking device memory allocation...
Debug Test 2: Device memory allocation check passed (Valid addresses)
Debug: Device memory allocated successfully
Debug: Allocating pinned host memory...
Debug: Pinned host memory allocated successfully
Debug Test 3: Checking pinned host memory allocation...
Debug Test 3: Pinned host memory allocation check passed (Valid address)
Debug: inputBuffer = 0x901400000, outputBuffer = 0x901401000, hostOutput = 0x212000
Debug: --- Before hipMemcpyHostToDevice ---
Debug: inputBuffer (device ptr): 0x901400000
Debug: inputString (inputString_A, host ptr): 0x7fff95534d20
Debug: First char of inputString: A
Debug: size: 64
Debug: hipMemcpy direction: hipMemcpyHostToDevice
Debug: --- End Before hipMemcpyHostToDevice ---
Debug: Copying input string to device...
Segmentation fault (core dumped)
(base) (Wed Mar 05 16:05:11) c@archb hello_world$ 

chboishabba avatar Mar 05 '25 06:03 chboishabba

(base) (Wed Mar 05 13:17:51) c@archb hello_world$ gdb ./hip_test_debug GNU gdb (GDB) 16.2 Copyright (C) 2024 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./hip_test_debug... (gdb) b main Breakpoint 1 at 0x133c: file hip_test.cpp, line 26. (gdb) c The program is not being run. (gdb) run Starting program: /opt/rocm_sdk_612/docs/examples/hipcc/hello_world/hip_test_debug This GDB supports auto-downloading debuginfo from the following URLs: <https://debuginfod.archlinux.org> Enable debuginfod for this session? (y or [n]) Debuginfod has been disabled. To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit. [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/libthread_db.so.1". Breakpoint 1, main () at hip_test.cpp:26 warning: Source file is more recent than executable. 26 } (gdb) c Continuing. Debug: Input string size = 6 Debug: Allocating device memory... [New Thread 0x7fffecaea6c0 (LWP 1164261)] [New Thread 0x7fffe7fff6c0 (LWP 1164262)] [Thread 0x7fffe7fff6c0 (LWP 1164262) exited] Debug: Device memory allocated successfully Debug: Allocating pinned host memory... Debug: Pinned host memory allocated successfully Debug: inputBuffer = 0x901400000, outputBuffer = 0x901401000, hostOutput = 0x212000 Debug: Copying input string to device... Thread 1 "hip_test_debug" received signal SIGSEGV, Segmentation fault. 0x00007ffff7db0fbd in ?? () from /opt/rocm_sdk_612/lib64/libamdhip64.so.6 (gdb) info registers rax 0xfffffffffffffff8 -8 rbx 0x55555569e970 93824993585520 rcx 0x3 3 rdx 0x55555559d140 93824992530752 rsi 0x3 3 rdi 0x55555569e970 93824993585520 rbp 0x0 0x0 rsp 0x7fffffffd650 0x7fffffffd650 r8 0x0 0 r9 0x0 0 r10 0x4 4 r11 0x7fffecca13a0 140737166054304 r12 0x0 0 r13 0x555555582410 93824992420880 r14 0x0 0 r15 0x1 1 rip 0x7ffff7db0fbd 0x7ffff7db0fbd eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 --Type <RET> for more, q to quit, c to continue without paging-- es 0x0 0 fs 0x0 0 gs 0x0 0 fs_base 0x7ffff7f85a80 140737353636480 gs_base 0x0 0 (gdb)

chboishabba avatar Mar 05 '25 06:03 chboishabba

0x7ffff7db0fbd: mov 0x18(%rbp), %rax

(base) (Wed Mar 05 16:21:14) c@archb hello_world$ gdb ./hip_test
GNU gdb (GDB) 16.2
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./hip_test...
(No debugging symbols found in ./hip_test)
(gdb) b hipMemcpyHostToDevice
Function "hipMemcpyHostToDevice" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (hipMemcpyHostToDevice) pending.
(gdb) stepi
The program is not being run.
(gdb) run
Starting program: /opt/rocm_sdk_612/docs/examples/hipcc/hello_world/hip_test 

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) 
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffecaea6c0 (LWP 1217121)]
[New Thread 0x7fffe7fff6c0 (LWP 1217122)]
[Thread 0x7fffe7fff6c0 (LWP 1217122) exited]
Debug Device Info: Number of HIP devices found: 1
Debug Device Info: Device Name: AMD Radeon RX 580 Series
Debug Device Info: Total Global Memory: 8589934592 bytes
Debug Device Info: Shared Memory per Block: 65536 bytes
Debug Device Info: Max Threads Per Block: 1024
Debug Device Info: Max Grid Dimensions: (2147483647, 65536, 65536)
Debug Device Info: Memory Clock Rate: 1411000 kHz
Debug Device Info: Memory Bus Width: 256 bits
Debug Device Info: Concurrent Kernels: Yes
Debug Device Info: Pinned Memory Support: Yes
Debug Device Info: Finished displaying device information.
Debug Test 1: Actual input string size = 2, using transfer size = 64
Debug: Input string size = 64 for inputString_A
Debug: paddedInput host ptr: 0x7fffffffd8c0
Debug: Allocating device memory...
Debug Test 2: Checking device memory allocation...
Debug Test 2: Device memory allocation check passed (Valid addresses)
Debug: Device memory allocated successfully
Debug: Allocating pinned host memory...
Debug: Pinned host memory allocated successfully
Debug Test 3: Checking pinned host memory allocation...
Debug Test 3: Pinned host memory allocation check passed (Valid address)
Debug: inputBuffer = 0x901400000, outputBuffer = 0x901401000, hostOutput = 0x212000
Debug: --- Before hipMemcpyHostToDevice ---
Debug: inputBuffer (device ptr): 0x901400000
Debug: inputString (inputString_A, host ptr): 0x7fffffffd8c0
Debug: First char of inputString: A
Debug: size: 64
Debug: hipMemcpy direction: hipMemcpyHostToDevice
Debug: --- End Before hipMemcpyHostToDevice ---
Debug: Copying input string to device...

Thread 1 "hip_test" received signal SIGSEGV, Segmentation fault.
0x00007ffff7db0fbd in ?? ()
   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
(gdb) display /x $rax
1: /x $rax = 0xfffffffffffffff8
(gdb) stepi
Couldn't get registers: No such process.
(gdb) [Thread 0x7ffff7f85a80 (LWP 1217118) exited]
[Thread 0x7fffecaea6c0 (LWP 1217121) exited]
[New process 1217118]

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
display /x $rax
2: /x $rax = <error: No registers.>
(gdb) x/20i $pc
No registers.
(gdb) b hipMemcpyHostToDevice
Function "hipMemcpyHostToDevice" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (hipMemcpyHostToDevice) pending.
(gdb) run
Starting program: /opt/rocm_sdk_612/docs/examples/hipcc/hello_world/hip_test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7fffecaea6c0 (LWP 1217243)]
[New Thread 0x7fffe7fff6c0 (LWP 1217244)]
[Thread 0x7fffe7fff6c0 (LWP 1217244) exited]
Debug Device Info: Number of HIP devices found: 1
Debug Device Info: Device Name: AMD Radeon RX 580 Series
Debug Device Info: Total Global Memory: 8589934592 bytes
Debug Device Info: Shared Memory per Block: 65536 bytes
Debug Device Info: Max Threads Per Block: 1024
Debug Device Info: Max Grid Dimensions: (2147483647, 65536, 65536)
Debug Device Info: Memory Clock Rate: 1411000 kHz
Debug Device Info: Memory Bus Width: 256 bits
Debug Device Info: Concurrent Kernels: Yes
Debug Device Info: Pinned Memory Support: Yes
Debug Device Info: Finished displaying device information.
Debug Test 1: Actual input string size = 2, using transfer size = 64
Debug: Input string size = 64 for inputString_A
Debug: paddedInput host ptr: 0x7fffffffd8c0
Debug: Allocating device memory...
Debug Test 2: Checking device memory allocation...
Debug Test 2: Device memory allocation check passed (Valid addresses)
Debug: Device memory allocated successfully
Debug: Allocating pinned host memory...
Debug: Pinned host memory allocated successfully
Debug Test 3: Checking pinned host memory allocation...
Debug Test 3: Pinned host memory allocation check passed (Valid address)
Debug: inputBuffer = 0x901400000, outputBuffer = 0x901401000, hostOutput = 0x212000
Debug: --- Before hipMemcpyHostToDevice ---
Debug: inputBuffer (device ptr): 0x901400000
Debug: inputString (inputString_A, host ptr): 0x7fffffffd8c0
Debug: First char of inputString: A
Debug: size: 64
Debug: hipMemcpy direction: hipMemcpyHostToDevice
Debug: --- End Before hipMemcpyHostToDevice ---
Debug: Copying input string to device...

Thread 1 "hip_test" received signal SIGSEGV, Segmentation fault.
0x00007ffff7db0fbd in ?? ()
   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6
1: /x $rax = 0xfffffffffffffff8
2: /x $rax = 0xfffffffffffffff8
(gdb) x/20i $pc
=> 0x7ffff7db0fbd:	mov    0x18(%rbp),%rax
   0x7ffff7db0fc1:	lea    0x18(%rbp),%r12
   0x7ffff7db0fc5:	test   $0x1,%al
   0x7ffff7db0fc7:	jne    0x7ffff7db1368
   0x7ffff7db0fcd:	mov    %rax,%rcx
   0x7ffff7db0fd0:	or     $0x1,%rcx
   0x7ffff7db0fd4:	lock cmpxchg %rcx,(%r12)
   0x7ffff7db0fda:	jne    0x7ffff7db1380
   0x7ffff7db0fe0:	mov    %rdx,0x70(%rbp)
   0x7ffff7db0fe4:	movl   $0x1,0x78(%rbp)
   0x7ffff7db0feb:	mov    0x148(%rbx),%r13
   0x7ffff7db0ff2:	cmpq   $0x0,0x1c0(%r13)
   0x7ffff7db0ffa:	je     0x7ffff7db13a0
   0x7ffff7db1000:	mov    0x1c8(%r13),%rax
   0x7ffff7db1007:	mov    %rbx,0x150(%rax)
   0x7ffff7db100e:	mov    %rbx,0x1c8(%r13)
   0x7ffff7db1015:	xor    %edx,%edx
   0x7ffff7db1017:	mov    $0x2,%esi
   0x7ffff7db101c:	mov    %rbx,%rdi
   0x7ffff7db101f:	call   0x7ffff7db0930
(gdb) info registers
rax            0xfffffffffffffff8  -8
rbx            0x555555698980      93824993560960
rcx            0x3                 3
rdx            0x555555597140      93824992506176
rsi            0x3                 3
rdi            0x555555698980      93824993560960
rbp            0x0                 0x0
rsp            0x7fffffffd120      0x7fffffffd120
r8             0x0                 0
r9             0x0                 0
r10            0x4                 4
r11            0x7fffecca13a0      140737166054304
r12            0x0                 0
r13            0x55555557c410      93824992396304
r14            0x0                 0
r15            0x1                 1
rip            0x7ffff7db0fbd      0x7ffff7db0fbd
eflags         0x10246             [ PF ZF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
fs_base        0x7ffff7f85a80      140737353636480
gs_base        0x0                 0
(gdb) stepi
[Thread 0x7ffff7f85a80 (LWP 1217239) exited]
[Thread 0x7fffecaea6c0 (LWP 1217243) exited]
[New process 1217239]

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) x/20i $pc
No registers.
(gdb) info registers
The program has no registers now.
(gdb) 

chboishabba avatar Mar 05 '25 06:03 chboishabba

 To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/usr/lib/libthread_db.so.1".

[New Thread 0x7fffecaea6c0 (LWP 1227530)]

[New Thread 0x7fffe7fff6c0 (LWP 1227531)]

[Thread 0x7fffe7fff6c0 (LWP 1227531) exited]

Debug Device Info: Number of HIP devices found: 1

Debug Device Info: Device Name: AMD Radeon RX 580 Series

Debug Device Info: Total Global Memory: 8589934592 bytes

Debug Device Info: Shared Memory per Block: 65536 bytes

Debug Device Info: Max Threads Per Block: 1024

Debug Device Info: Max Grid Dimensions: (2147483647, 65536, 65536)

Debug Device Info: Memory Clock Rate: 1411000 kHz

Debug Device Info: Memory Bus Width: 256 bits

Debug Device Info: Concurrent Kernels: Yes

Debug Device Info: Pinned Memory Support: Yes

Debug Device Info: Finished displaying device information.

Debug Test 1: Actual input string size = 2, using transfer size = 64

Debug: Input string size = 64 for inputString_A

Debug: paddedInput host ptr: 0x7fffffffd8c0

Debug: Allocating device memory...

Debug Test 2: Checking device memory allocation...

Debug Test 2: Device memory allocation check passed (Valid addresses)

Debug: Device memory allocated successfully

Debug: Allocating pinned host memory...

Debug: Pinned host memory allocated successfully

Debug Test 3: Checking pinned host memory allocation...

Debug Test 3: Pinned host memory allocation check passed (Valid address)

Debug: inputBuffer = 0x901400000, outputBuffer = 0x901401000, hostOutput = 0x212000

Debug: --- Before hipMemcpyHostToDevice ---

Debug: inputBuffer (device ptr): 0x901400000

Debug: inputString (inputString_A, host ptr): 0x7fffffffd8c0

Debug: First char of inputString: A

Debug: size: 64

Debug: hipMemcpy direction: hipMemcpyHostToDevice

Debug: --- End Before hipMemcpyHostToDevice ---

Debug: Copying input string to device...


Thread 1 "hip_test" hit Catchpoint 1 (exception thrown), 0x00007ffff76ae63a in __cxxabiv1::__cxa_throw (obj=0x555555698900,

    tinfo=0x7fffecdb2008 <typeinfo for rocr::AMD::hsa_exception>,

    dest=0x7fffecc36600 <rocr::AMD::hsa_exception::~hsa_exception()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:81

warning: 81    /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory

(gdb) print *(rocr::AMD::hsa_exception*)obj

No symbol "rocr" in current context.

(gdb) print *(rocr::AMD::hsa_exception*)objbtQuit

(gdb) bt

#0  0x00007ffff76ae63a in __cxxabiv1::__cxa_throw (

    obj=0x555555698900,

    tinfo=0x7fffecdb2008 <typeinfo for rocr::AMD::hsa_exception>,

    dest=0x7fffecc36600 <rocr::AMD::hsa_exception::~hsa_exception()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:81

#1  0x00007fffecc1e0e6 in rocr::AMD::AqlQueue::AqlQueue(rocr::AMD::GpuAgent*, unsigned long, unsigned int, rocr::AMD::ScratchCache::ScratchInfo&, void (*)(hsa_status_t, hsa_queue_s*, void*), void*, bool) [clone .cold] ()

   from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1

#2  0x00007fffecc4871b in std::_Function_handler<rocr::core::Queue* (), rocr::AMD::GpuAgent::InitDma()::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()

   from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1

#3  0x00007fffecc464a4 in rocr::AMD::GpuAgent::QueueCreate(unsigned long, unsigned int, void (*)(hsa_status_t, hsa_queue_s*, void*), void*, unsigned int, unsigned int, rocr::core::Queue**) ()

   from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1

#4  0x00007fffecc6596f in rocr::HSA::hsa_queue_create(hsa_agent_s, unsigned int, unsigned int, void (*)(hsa_status_t, hsa_queue_s*, void*), void*, unsigned int, unsigned int, hsa_queue_s**) ()

   from /opt/rocm_sdk_612/lib64/libhsa-runtime64.so.1

#5  0x00007ffff7dd85f7 in ?? ()

   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6

#6  0x00007ffff7df56a9 in ?? ()

   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6

--Type <RET> for more, q to quit, c to continue without paging--

#7  0x00007ffff7dd0946 in ?? ()

   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6

#8  0x00007ffff7db4cc5 in ?? ()

   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6

#9  0x00007ffff7cdd4d5 in ?? ()

   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6

#10 0x00007ffff7b092bc in ?? ()

   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6

#11 0x00007ffff7c14c8d in ?? ()

   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6

#12 0x00007ffff7c14e3e in ?? ()

   from /opt/rocm_sdk_612/lib64/libamdhip64.so.6

#13 0x00005555555569a5 in main ()

(gdb)


warning: 81    /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory

(gdb) print *(rocr::AMD::hsa_exception*)obj

No symbol "rocr" in current context.

(gdb) print *(rocr::AMD::hsa_exception*)obj

You've made significant progress! Here's a breakdown of the new information and what it means:

Key Observations:

    No symbol "rocr" in current context.:
        This error when trying to print the rocr::AMD::hsa_exception object indicates that GDB doesn't have the necessary debug symbols for the ROCr runtime.
        Without these symbols, GDB cannot understand the structure of the rocr::AMD::hsa_exception class and therefore cannot print its contents.
    Backtrace Analysis:
        The backtrace provides valuable information about the call stack leading to the exception:
            #0 __cxxabiv1::__cxa_throw: The exception is thrown.
            #1 rocr::AMD::AqlQueue::AqlQueue: The exception originates from the constructor of rocr::AMD::AqlQueue, which is used to create a queue for asynchronous operations.
            #2 std::_Function_handler<rocr::core::Queue* (), rocr::AMD::GpuAgent::InitDma()::{lambda()#1}>::_M_invoke: A lambda function related to DMA initialization is being invoked.
            #3 rocr::AMD::GpuAgent::QueueCreate: The QueueCreate function of GpuAgent is being called.
            #4 rocr::HSA::hsa_queue_create: The HSA API function hsa_queue_create is being called.
            #5 - #12: These are internal calls within libamdhip64.so.6, likely related to HIP runtime initialization and memory copy setup.
            #13 main: The main function is the entry point of your program.
    HSA Queue Creation Failure:
        The backtrace shows that the exception is thrown during the creation of an HSA queue (hsa_queue_create).
        This suggests that the hipMemcpyHostToDevice operation is failing because it cannot create the necessary HSA queue for asynchronous memory transfers.

chboishabba avatar Mar 05 '25 07:03 chboishabba