aomp icon indicating copy to clipboard operation
aomp copied to clipboard

host-side memory leak

Open FabioLuporini opened this issue 2 years ago • 11 comments

Below a pure-C minimal failing example showing an increase in memory consumption when multiple omp-offloading shared objects are called back to back from Python

https://github.com/devitocodes/devito/tree/patch-omp-off-leakage/tests/omp-mfe

the MFE files are hosted on a devito branch, but the MFE is completely independent of devito

reproduced with:

  • Rocm 5.4.1 aompcc 14.0
  • Rocm 4.5.2 aompcc 13

hypothesis: openmp runtime keeping around pinned memory buffers

run as per README.md at link

FabioLuporini avatar Jun 29 '22 14:06 FabioLuporini

Now with a pure-C reproducer (no python involved)

I'm doing plain dlopen / dlclose

https://github.com/FabioLuporini/hpc-bugs/tree/main/omp-off-leak/c

FabioLuporini avatar Jun 30 '22 17:06 FabioLuporini

unable to access this link: https://github.com/FabioLuporini/hpc-bugs/tree/main/omp-off-leak/c

ronlieb avatar Jul 25 '22 11:07 ronlieb

Sorry, I renamed the folders at some point.

Here's the working link: https://github.com/FabioLuporini/hpc-bugs/tree/main/amdgpu.clang-amd/omp-off-leak/c

FabioLuporini avatar Jul 25 '22 12:07 FabioLuporini

@Lynd98 could you grab this testcase and valgrind it

ronlieb avatar Jul 25 '22 12:07 ronlieb

15.0-3:

==3417==    definitely lost: 16,112 bytes in 8 blocks
==3417==    indirectly lost: 163 bytes in 3 blocks
==3417==      possibly lost: 84,860 bytes in 240 blocks
==3417==    still reachable: 948,396 bytes in 2,866 blocks
==3417==                       of which reachable via heuristic:
==3417==                         multipleinheritance: 272 bytes in 3 blocks
==3417==         suppressed: 0 bytes in 0 blocks

16.0-0

==3089== LEAK SUMMARY:
==3089==    definitely lost: 344 bytes in 5 blocks
==3089==    indirectly lost: 163 bytes in 3 blocks
==3089==      possibly lost: 84,860 bytes in 240 blocks
==3089==    still reachable: 952,270 bytes in 2,971 blocks
==3089==                       of which reachable via heuristic:
==3089==                         multipleinheritance: 272 bytes in 3 blocks
==3089==         suppressed: 0 bytes in 0 blocks

estewart08 avatar Sep 20 '22 17:09 estewart08

@estewart08 is the fix in ROCm v5.2.3 or in any of the docker images here https://hub.docker.com/r/rocm/dev-ubuntu-20.04/tags ?

FabioLuporini avatar Oct 05 '22 07:10 FabioLuporini

No, the fix is in AOMP 16.0-0 and will be in ROCm 5.4.

estewart08 avatar Oct 05 '22 22:10 estewart08

excellent, thanks!

any ETA on the release (ballpark OK -- weeks / months?)

FabioLuporini avatar Oct 06 '22 13:10 FabioLuporini

Can we check if this is working in 16.0-0. Or wait till 16.0-1 comes out later this week and recheck.

gregrodgers avatar Oct 18 '22 20:10 gregrodgers

Hi Greg, I talked to @yaomingamd who told me that the aompcc wrapper is broken in v5.3, at least the one deployed on your docker hub, which we depend on: https://hub.docker.com/r/rocm/dev-ubuntu-20.04/tags

I've been advised to rather use amdclang, is that how we should proceed? I'll see if I can start a build later today

FabioLuporini avatar Oct 19 '22 07:10 FabioLuporini

the script is fixed in upcoming 5.4 release. the change is fairly straightforward update your copy from here: https://github.com/ROCm-Developer-Tools/aomp-extras/blob/aomp-dev/utils/bin/aompcc

to move to clang/clang++/amdclang/amdclang++
explicitly add -v to your aompcc and you can observe what options it added for your typically: -target $HOST_TARGET -fopenmp -fopenmp-targets=$TARGET_TRIPLE -Xopenmp-target=$TARGET_TRIPLE -march=$AOMP_GPU

ronlieb avatar Oct 19 '22 11:10 ronlieb