Umpire
Umpire copied to clipboard
Out-of-memory error during program teardown deallocations
Describe the bug
We have a SAMRAI test problem running on CPUs only that allocates most of its arrays for numerical data using QuickPool host allocators. When deallocating those arrays during program teardown, we hit an out-of-memory error when QuickPool goes into do_coalesce()
and tries to malloc a large chunk of memory.
To Reproduce
I have provided a reproducer and build/run instructions to @mcfadden8 .
Expected behavior
We did not expect a call to umpire::Allocator::deallocate() to cause an allocation call that hits an OOM error.
Compilers & Libraries (please complete the following information): Using umpire 2023.06.0
- Compiler & version: Reproducer has been provided using gcc 10.3.1 on TOSS4. I don't believe this is unique to a particular compiler/platform.
Additional context
We have a workaround that makes CPU-only runs use a default host allocator instead of a QuickPool-based allocator. This is successful, but we would like our CPU-only tests to use QuickPool, as we use QuickPool on GPUs and want to keep the code base for CPU and GPU unified wherever possible. We also don't know if this is a bug that could also happen on allocation/deallocation of GPU data, though we have not seen this kind of error on a GPU run.
Thank you for writing this up @nselliott, we are tracking this issue here: https://rzlc.llnl.gov/gitlab/umpire/umpire/-/issues/12
I've been able to reproduce the issue and am investigating the cause. It is normal behavior for umpire to coalesce blocks of pool memory as they become available during deallocation time. The amount of memory that Umpire is attempting to allocate that causes the OOM appears to be a bogus (extremely large) amount. I'm instrumenting the library to determine where the internal accounting is going wrong.
I am glad to hear that you are able to temporarily work around this issue while we work on a fix.
https://github.com/LLNL/Umpire/pull/845
@mcfadden8 Did that pull request sufficiently fix this?
@nselliott - Yes. There is more information provided here: https://rzlc.llnl.gov/gitlab/umpire/umpire/-/issues/12