HIP icon indicating copy to clipboard operation
HIP copied to clipboard

[Hip Graph] Issue when using a kernel from different graphs

Open Kawaboongawa opened this issue 2 years ago • 1 comments

Hi everybody,

I’m trying some fancy things with HIP graphs and I just ran into a bug which seems to be coming from the HIP graph API. Using multiple graphs that calls the same kernel will systematically use the latest kernel arguments for each graph call.

Here comes a sample illustrating this issue :

test_graph_print_hip.cpp:

#include <hip/hip_runtime.h>
#include <stdio.h>
#include <iostream>
#include <unistd.h>
#include <string>
#include <vector>

template <typename T>
void check(T result, char const *const func, const char *const file,
           int const line) {
  if (result) {
    fprintf(stderr, "Hip error at %s:%d code=%d(%s) \"%s\" \n", file, line,
            static_cast<unsigned int>(result), hipGetErrorName(result), func);
    exit(EXIT_FAILURE);
  }
}
#define checkHipErrors(val) check((val), #val, __FILE__, __LINE__)


__global__ void kernel(int i, volatile int *stop) {
        printf("Kernel %d going...\n", i);
}

int main(int argc, char* argv[]) {
  int n = std::stoi(argv[1]);
  std::vector<hipStream_t> streams(n);
  int* stop;
  for (int i = 0; i < n; ++i)
    hipStreamCreate(&streams[i]);

  std::vector<hipGraph_t> cuGraphs(n);
  std::vector<hipGraphExec_t> graphExec(n, NULL);

  for (int i = 0; i < n; ++i)
  {
  checkHipErrors(hipStreamBeginCapture(streams[i], hipStreamCaptureModeGlobal));

  hipLaunchKernelGGL(kernel, dim3(1), dim3(1), 0, streams[i], i, stop);
  
  checkHipErrors(hipStreamEndCapture(streams[i], &cuGraphs[i]));
  checkHipErrors(hipGraphInstantiate(&graphExec[i], cuGraphs[i], NULL, NULL, 0));
  checkHipErrors(hipGraphDestroy(cuGraphs[i]));

  }

  std::cout << "launching graph..." << std::endl;
  
  for (int i = 0; i < n; ++i) 
    checkHipErrors(hipGraphLaunch(graphExec[i], streams[i]));
  std::cout << "Synchronizing processes..." << std::endl;
  hipDeviceSynchronize();

  return 0;
}

The code is built using:

hipcc test_graph_print_hip.cpp

The following code should give the following result : image

which is obtained when compiling this code with an Nvidia GPU.

However, using Rocm 5.0.1 with an AMD  GPU gives the following result :

image

hipconfig output

HIP version  : 5.0.13601-bb16828d

== hipconfig
HIP_PATH     : /opt/rocm-5.0.1/hip
ROCM_PATH    : /opt/rocm-5.0.1
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.0.1/hip/include -I/opt/rocm-5.0.1/llvm/bin/../lib/clang/14.0.0 -I/opt/rocm-5.0.1/hsa/include

== hip-clang
HSA_PATH         : /opt/rocm-5.0.1/hsa
HIP_CLANG_PATH   : /opt/rocm-5.0.1/llvm/bin
AMD clang version 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.0.1 22051 235b6880e2e515507478181ec11a20c1ec87945b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.0.1/llvm/bin
AMD LLVM version 14.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -std=c++11 -isystem "/opt/rocm-5.0.1/llvm/lib/clang/14.0.0/include/.." -isystem /opt/rocm-5.0.1/hsa/include -isystem "/opt/rocm-5.0.1/hip/include" -O3
hip-clang-ldflags  :  -L"/opt/rocm-5.0.1/hip/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

=== Environment Variables
PATH=/opt/rocm/llvm/bin:/opt/rocm/hip/bin:/opt/rocm:/home/ccetre/.local/bin:/home/ccetre/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/opt/rocm/bin:/opt/rocm/rocprofiler/bin:/opt/rocm/opencl/bin:/opt/rocm/bin:/opt/rocm/rocprofiler/bin:/opt/rocm/opencl/bin
LD_LIBRARY_PATH=/opt/rocm/llvm/lib:/opt/rocm/hip/lib

== Linux Kernel
Hostname     : opengpu2
Linux opengpu2 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal

I found a temporary workaround by using a different kernel for each graph, which unfortunately leads to serious code duplication.

Kawaboongawa avatar Mar 15 '22 16:03 Kawaboongawa

I don't see the same error on rocm 4.5.0:

HIP version  : 4.4.21401-bedc5f61

== hipconfig
HIP_PATH     : /opt/rocm-4.5.0/hip
ROCM_PATH    : /opt/rocm-4.5.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-4.5.0/hip/include -I/opt/rocm-4.5.0/llvm/bin/../lib/clang/13.0.0 -I/opt/rocm-4.5.0/include

== hip-clang
HSA_PATH         : /opt/rocm-4.5.0
HIP_CLANG_PATH   : /opt/rocm-4.5.0/llvm/bin
AMD clang version 13.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-4.5.0 21422 e2489b0d7ede612d6586c61728db321047833ed8)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-4.5.0/llvm/bin
AMD LLVM version 13.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver2

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -std=c++11 -isystem "/opt/rocm-4.5.0/llvm/lib/clang/13.0.0/include/.." -isystem /opt/rocm-4.5.0/include -isystem "/opt/rocm-4.5.0/hip/include" -O3 --rocm-path=/opt/rocm-4.5.0
hip-clang-ldflags  : --driver-mode=g++ -L"/opt/rocm-4.5.0/hip/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

Output:

$  ./graph 4
launching graph...
Kernel 0 going...
Kernel 1 going...
Kernel 2 going...
Synchronizing processes...
Kernel 3 going...

But it is present in rocm/5.0.0:

$ ./graph 4
launching graph...
Kernel 3 going...
Kernel 3 going...
Kernel 3 going...
Synchronizing processes...
Kernel 3 going...

Output:

$ hipconfig 
HIP version  : 5.0.13601-6b731c37

== hipconfig
HIP_PATH     : /opt/rocm-5.0.0/hip
ROCM_PATH    : /opt/rocm-5.0.0
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.0.0/hip/include -I/opt/rocm-5.0.0/llvm/bin/../lib/clang/14.0.0 -I/opt/rocm-5.0.0/include

== hip-clang
HSA_PATH         : /opt/rocm-5.0.0
HIP_CLANG_PATH   : /opt/rocm-5.0.0/llvm/bin
AMD clang version 14.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.0.0 22051 235b6880e2e515507478181ec11a20c1ec87945b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-5.0.0/llvm/bin
AMD LLVM version 14.0.0git
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver2

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags :  -std=c++11 -isystem "/opt/rocm-5.0.0/llvm/lib/clang/14.0.0/include/.." -isystem /opt/rocm-5.0.0/include -isystem "/opt/rocm-5.0.0/hip/include" -O3 --rocm-path=/opt/rocm-5.0.0
hip-clang-ldflags  :  -L"/opt/rocm-5.0.0/hip/lib" -O3 -lgcc_s -lgcc -lpthread -lm -lrt

Note: compile command hipcc -o graph --amdgpu-target=gfx908 graph.cc Note2: I also see the error in rocm/5.1.0.

frobnitzem avatar Jun 07 '22 18:06 frobnitzem

It seems the issue has been fixed at some point. It seems fixed in rocm 5.4.3

image

Unless someone reproduces this issue with another hardware, I'll close it by the end of the week.

Kawaboongawa avatar Apr 18 '23 09:04 Kawaboongawa

On my system, I found rocm 5.2.0 still had the issue, but it's fixed starting from rocm 5.3.0 and onward.

frobnitzem avatar May 08 '23 17:05 frobnitzem