Ye Luo

Results 358 comments of Ye Luo

On NVIDIA GPUs, registering pinned memory is a quite expensive operation while checking a pointer to see if it has been pinned already is cheap. I still have a feeling...

The hsa is not well documented and I have difficulty to make it working as expected. https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/128 I doubt hsa does reference counting and locking and unlocking on already locked...

It is good the hsa does reference counting. In my measurements, tracer shows lock takes 35us and unlock takes 7us. They are not cheap. If I pinned via HIP ahead...

If my measurement is mapped to your code example, case 1 the lock part takes 7us. case 3 the inner lock takes 0.9us. > Conclusion: if you have already locked...

It is also possible that tests under CI doesn't cover the reproducer case.

@JonChesterfield does the reproducer fail on your machine?

@JonChesterfield @jhuber6 Any insights

~~dlopen libhsa build has this issue. if the plugin is built agaist libhsa directly, no problem.~~ When my test pass in certain scenarios, strace shows pulling libomptarget.rtl.amdgpu.so from rocm/5.2.0. Once...

> CI still green, oddly. Wonder if that's running with asserts disabled https://lab.llvm.org/buildbot/#/builders/193 @JonChesterfield @ronlieb could you check if the desired libomptarget.rtl.amdgpu.so gets picked during test-openmp.

``` (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007ffff693e7f1 in __GI_abort () at abort.c:79 #2 0x00007ffff692e3fa in __assert_fail_base (fmt=0x7ffff6ab56c0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7ffff6d03830 "INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH...