cudf
cudf copied to clipboard
[BUG] ContiguousSplitUntypedTest fails when run with the arena allocator
Describe the bug ContiguousSplitUntypedTest fails when run with the arena allocator but passes when run with the pool or cuda allocators:
[ RUN ] ContiguousSplitUntypedTest.CalculationOverflow
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: RMM failure at:/home/jlowe/src/spark-rapids-jni/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/arena_memory_resource.hpp:159: Maximum pool size exceeded"
thrown in the test body.
[ FAILED ] ContiguousSplitUntypedTest.CalculationOverflow (8 ms)
Steps/Code to reproduce bug
Run cpp/build/gtests/COPYING_TEST --rmm_mode=arena
Expected behavior Tests should pass with any supported RMM memory resource.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This appears to be an out of memory error which is not a problem with the algorithm. The COPYING_TEST
runs fine on my 48GB GPU with the arena allocator. The max memory required for COPYING_TEST
appears to be about 25GB with the arena allocator.
Was is the GPU stats where this error occurs? Can you run this on a larger GPU?
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
I ran this on a 16GB V100, so I guess if tests are expected to require more than that this is "working as designed." However it seems a bit excessive to need that much memory for a test.
I agree. Actually, I don't think we should have this specific gtest. I'm inclined to disable it or remove it altogether.