HIP icon indicating copy to clipboard operation
HIP copied to clipboard

[Issue]: ROCm 6.3.3 allocation failure with stream

Open etiennemlb opened this issue 8 months ago • 1 comments

Problem Description

The 4.6 kokkos release lead to a change in the way the allocations are done. They now default to hipMallocAsync. https://github.com/kokkos/kokkos/pull/7659

The behaviors of hipMallocAsync leads to OOM where hipMalloc does not. I have a reproducer for the allocation issue. Using rocm 6.3.3:

#include <hip/hip_runtime.h>

#include <cstdio>
#include <iostream>
#include <vector>

void checkhiperror(hipError_t error) {
    if(error != hipSuccess) {
        std::cout << hipGetErrorString(error) << '\n';
        std::exit(1);
    }
}

static size_t current_allocated = 0;

// #define NOSTREAM

void* GPUMalloc(size_t byte_count, hipStream_t stream) {
    void* ptr;
#if defined(NOSTREAM)
    checkhiperror(hipMalloc(&ptr, byte_count));
#else
    checkhiperror(hipMallocAsync(&ptr, byte_count, stream));
    checkhiperror(hipDeviceSynchronize());
#endif
    current_allocated += byte_count;
    std::cout << "a current_allocated: " << current_allocated << std::endl;
    return ptr;
}

void GPUFree(void* ptr, size_t byte_count, hipStream_t stream) {
#if defined(NOSTREAM)
    checkhiperror(hipFree(ptr));
#else
    checkhiperror(hipFreeAsync(ptr, stream));
    checkhiperror(hipDeviceSynchronize());
#endif
    current_allocated -= byte_count;
    std::cout << "f current_allocated: " << current_allocated << std::endl;
}

void Workload(hipStream_t stream, double* host_stuff, size_t count) {
    void* device_ptr = GPUMalloc(count * sizeof(double), stream);
    checkhiperror(hipMemcpyAsync(device_ptr, host_stuff, count * sizeof(double), hipMemcpyHostToDevice, stream));
    GPUFree(device_ptr, count * sizeof(double), stream);
}

void Test1() {
    static constexpr size_t kCount = (1ULL << 32) + /* kokkos stuff */ 100;
    std::vector<double>     host_stuff;
    host_stuff.resize(kCount);

    hipStream_t stream;
    checkhiperror(hipStreamCreate(&stream));

    for(int i = 0; i < 10; ++i)
        Workload(stream, host_stuff.data(), kCount);
}

int main() {
    Test1();
}

I get the following results if stream alloc are used:

$ hipcc -O3 --offload-arch=gfx90a reproducer.hip && ./a.out
a current_allocated: 34359739168
f current_allocated: 0
out of memory
$ hipcc -O3 --offload-arch=gfx90a -DNOSTREAM reproducer.hip && ./a.out
a current_allocated: 34359739168
f current_allocated: 0
a current_allocated: 34359739168
f current_allocated: 0
a current_allocated: 34359739168
f current_allocated: 0
...
$echo "OS:" && cat /etc/os-release | grep -E "^(NAME=|VERSION=)";
OS:
NAME="Red Hat Enterprise Linux"
VERSION="8.10 (Ootpa)"
$ echo "CPU: " && cat /proc/cpuinfo | grep "model name" | sort --unique;
CPU:
model name      : AMD EPYC 7A53 64-Core Processor
$ echo "GPU:" && /opt/rocm/bin/rocminfo | grep -E "^\s*(Name|Marketing Name)";
GPU:
  Name:                    AMD EPYC 7A53 64-Core Processor
  Marketing Name:          AMD EPYC 7A53 64-Core Processor
  Name:                    AMD EPYC 7A53 64-Core Processor
  Marketing Name:          AMD EPYC 7A53 64-Core Processor
  Name:                    AMD EPYC 7A53 64-Core Processor
  Marketing Name:          AMD EPYC 7A53 64-Core Processor
  Name:                    AMD EPYC 7A53 64-Core Processor
  Marketing Name:          AMD EPYC 7A53 64-Core Processor
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI250X
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI250X
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI250X
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI250X
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI250X
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI250X
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI250X
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
  Name:                    gfx90a
  Marketing Name:          AMD Instinct MI250X
      Name:                    amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-

etiennemlb avatar Apr 10 '25 08:04 etiennemlb

Hi @etiennemlb. Internal ticket has been created to investigate your issue. Thanks!

ppanchad-amd avatar Apr 10 '25 14:04 ppanchad-amd

Hi @etiennemlb,

Thanks for the report! I was able to reproduce the issue on ROCm 6.3.3 and can confirm that we have a fix that will be released with ROCm 7.

darren-amd avatar Jul 21 '25 17:07 darren-amd

This issue has been migrated to: https://github.com/ROCm/rocm-systems/issues/399

systems-assistant[bot] avatar Aug 18 '25 18:08 systems-assistant[bot]