Ye Luo comments

Results 358 comments of


                                            Ye Luo

OpenMP HIP Memory interop issue

https://github.com/ye-luo/miniqmc ``` $ cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=1 -D OFFLOAD_TARGET=amdgcn-amd-amdhsa -D OFFLOAD_ARCH=gfx906 -DQMC_ENABLE_ROCM=ON .. $ make -j32 test_omptarget_memory_interop $ ctest -R test_omptarget_memory_interop --output-on-failure Test project /home/yeluo/opt/miniqmc/build_r7_rocmbuild_offload Start 9: unit_test_omptarget_memory_interop 1/1...

OpenMP HIP Memory interop issue

Your test passes on my machine. But if I add ``` err = hipMemset(a, 0, n * sizeof(int)); ``` the return value is hipErrorInvalidValue

OpenMP HIP Memory interop issue

My guess is that HIP carries some meta data around a device ptr if it is allocated via HIP. HIP APIs uses such info to do certain optimizations. When a...

linker error

Please also double check if mylib.a is presented twice on the link line, the linking can still succeed.

linker error

@gregrodgers Thank you. Fixing the original problem is appreciated. Both use cases are valid. > Please also double check if mylib.a is presented twice on the link line, the linking...

linker error

@jsjodin appreciated if you could remove the restriction.

unable to open hip GPU device (gfx1030) with latest AOMP

FYI, https://github.com/RadeonOpenCompute/ROCm/issues/887#issuecomment-822222885 I hope once ROCm side enables RDNA, AOMP works out of box. Right now, nailing the software on GFX9 is really critical.

More register used when multiple target regions are compiled together

reproducer ``` git clone https://github.com/ye-luo/miniqmc cd miniqmc/build cmake -DCMAKE_CXX_COMPILER=/home/yeluo/rocm/aomp_0.7-0/bin/clang++ \ -DENABLE_OFFLOAD=1 -DOFFLOAD_TARGET=amdgcn-amd-amdhsa \ -DCMAKE_CXX_FLAGS="-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 -v" \ .. make -j15 check_spo_batched ``` src/QMCWaveFunctions/einspline_spo_omp.cpp line 159, 238, 311, 405 have offload...

device plugin dead-lock at a atmi_malloc failure

I set 512MB GPU memory in my laptop BIOS and I can still reproduce the issue with rocm 4.1.0. The call stack depth seems being reduced. ``` (gdb) bt #0...

Runtime overhead on H2D and D2H tranfers

> That's curious. I thought the per-launch allocations had all been removed, though there is probably still allocation on the first target launch. I'll look into this next week. >...