aomp icon indicating copy to clipboard operation
aomp copied to clipboard

OpenMP HIP Memory interop issue

Open ye-luo opened this issue 4 years ago • 11 comments
trafficstars

HIP to OpenMP hipMalloc and pass the pointer to OpenMP target kernel via is_device_ptr works. OpenMP to HIP omp_target_alloc and pass the pointer to HIP API/kernel print error. For example, https://github.com/ye-luo/miniqmc/blob/a2f826952f5045898bda855b34f37e164a30df24/src/Platforms/tests/OMPTarget/test_omp_memory_interop.cpp#L91 hipMemset refuses to run. This prevents me calling hip/roc libraries in real applications.

Note: offloading to NVIDIA GPUs is achieved by using the per-device primary context in LLVM libomptarget CUDA plugin.

ye-luo avatar Oct 29 '21 23:10 ye-luo

It's not a fix for hipMemset, but you might be interested in hsa_amd_memory_fill as an alternative

JonChesterfield avatar Nov 19 '21 15:11 JonChesterfield

It's not a fix for hipMemset, but you might be interested in hsa_amd_memory_fill as an alternative

Thank you for the info. I don't really use hipMemset in the application but it makes the reproducer simple without writing a kernel and makefile.

ye-luo avatar Nov 19 '21 15:11 ye-luo

It's not a fix for hipMemset, but you might be interested in hsa_amd_memory_fill as an alternative

Thank you for the info. I don't really use hipMemset in the application but it makes the reproducer simple without writing a kernel and makefile.

What do you actually use in the application? omp_target_alloc returns a pointer that can be used in a kernel - that's what we have in one of our aomp smoke tests (e.g. https://github.com/ROCm-Developer-Tools/aomp/tree/aomp-dev/test/smoke/use_device_ptr)

carlobertolli avatar Nov 19 '21 16:11 carlobertolli

In the application, I use memory ptr from omp_target_alloc to call hipblas or hip kernel. I was expecting hipMemSet as a convenient routine which ends up calling a kernel.

It looks like working in your fortran case but seems failing in my C++ case.

ye-luo avatar Nov 19 '21 16:11 ye-luo

https://github.com/ye-luo/miniqmc

$ cmake -D CMAKE_CXX_COMPILER=clang++       -D ENABLE_OFFLOAD=1       -D OFFLOAD_TARGET=amdgcn-amd-amdhsa       -D OFFLOAD_ARCH=gfx906 -DQMC_ENABLE_ROCM=ON ..
$ make -j32 test_omptarget_memory_interop
$ ctest -R test_omptarget_memory_interop --output-on-failure
Test project /home/yeluo/opt/miniqmc/build_r7_rocmbuild_offload
    Start 9: unit_test_omptarget_memory_interop
1/1 Test #9: unit_test_omptarget_memory_interop ...***Failed    1.08 sec
test memory_interop map
test memory_interop vendor device alloc
test memory_interop omp_target_alloc
hipAssert: hipErrorInvalidValue hipErrorInvalidValue, file /home/yeluo/opt/miniqmc/src/Platforms/tests/OMPTarget/test_omp_memory_interop.cpp, line 91
hipMemset failed on omp_target_alloc memory!


ye-luo avatar Nov 19 '21 16:11 ye-luo

The following works for me (not sure if this is guaranteed, so please do take it with a grain of salt) C++ file

#include <stdio.h>
#include <omp.h>
#include "hip_memset.h"

#define N 1000

int main() {
  int n = N;
  int *a = (int *) omp_target_alloc(n * sizeof(int), omp_get_default_device());
  set_mem(a, n);
  int err = 0;
  for(int i = 0; i < n; i++)
    if (a[i] != 0) {
      printf("Error at %d: a[%d] = %d\n", i, i, a[i]);
      err++;
      if (err > 10) break;
    }
  return err;
}

Header file hip_memset.h: void set_mem(int *a, int n);

HIP implementation file:

#include <hip/hip_runtime.h>

__global__ void set_kernel(int *a, int n) {
  for(int i = 0; i < n; i++)
    a[i] = 0;
}

void set_mem(int *a, int n) {
  hipError_t err;

  set_kernel<<<1, 1, 0>>>(a, n);
  hipDeviceSynchronize();
  if(err != HIP_SUCCESS)
    printf("Error %s!\n", hipGetErrorName(err));
}

carlobertolli avatar Nov 19 '21 16:11 carlobertolli

Your test passes on my machine. But if I add

err = hipMemset(a, 0, n * sizeof(int));

the return value is hipErrorInvalidValue

ye-luo avatar Nov 19 '21 17:11 ye-luo

That, I was also able to reproduce locally.

carlobertolli avatar Nov 19 '21 17:11 carlobertolli

My guess is that HIP carries some meta data around a device ptr if it is allocated via HIP. HIP APIs uses such info to do certain optimizations. When a device pointer allocated is from OpenMP, hipMemset doesn't know how to handle the ptr due to missing meta data and thus error out. On the other hand, a HIP kernel only needs a device pointer and bypasses all the meta data checks.

ye-luo avatar Nov 19 '21 17:11 ye-luo

Added failing test to capture this: https://github.com/ROCm-Developer-Tools/aomp/tree/aomp-dev/test/smoke-fails/mix_hipmemset_omp

carlobertolli avatar Jan 17 '22 16:01 carlobertolli

This is still failing in 16.0-1. I also get a minor linkage issue . /home/grodgers/rocm/aomp/bin/clang++ -O3 -L/home/grodgers/rocm/aomp/lib -lamdhip64 -Wl,-rpath,/home/grodgers/rocm/aomp/lib -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 -D__OFFLOAD_ARCH_gfx906__ memset.cpp hip_memset.o -o memset warning: Linking two modules of different data layouts: '/tmp/hip_memset-4f5c56.o' is '' whereas 'llvm-link' is 'e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7'

gregrodgers avatar Oct 18 '22 20:10 gregrodgers