cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[BUG] ContiguousSplitUntypedTest fails when run with the arena allocator

Open jlowe opened this issue 2 years ago • 5 comments

Describe the bug ContiguousSplitUntypedTest fails when run with the arena allocator but passes when run with the pool or cuda allocators:

[ RUN      ] ContiguousSplitUntypedTest.CalculationOverflow
unknown file: Failure
C++ exception with description "std::bad_alloc: out_of_memory: RMM failure at:/home/jlowe/src/spark-rapids-jni/thirdparty/cudf/cpp/build/_deps/rmm-src/include/rmm/mr/device/arena_memory_resource.hpp:159: Maximum pool size exceeded" 
thrown in the test body.
[  FAILED  ] ContiguousSplitUntypedTest.CalculationOverflow (8 ms)

Steps/Code to reproduce bug Run cpp/build/gtests/COPYING_TEST --rmm_mode=arena

Expected behavior Tests should pass with any supported RMM memory resource.

jlowe avatar Jul 12 '22 18:07 jlowe

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Aug 11 '22 19:08 github-actions[bot]

This appears to be an out of memory error which is not a problem with the algorithm. The COPYING_TEST runs fine on my 48GB GPU with the arena allocator. The max memory required for COPYING_TEST appears to be about 25GB with the arena allocator. Was is the GPU stats where this error occurs? Can you run this on a larger GPU?

davidwendt avatar Aug 15 '22 12:08 davidwendt

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Sep 14 '22 13:09 github-actions[bot]

I ran this on a 16GB V100, so I guess if tests are expected to require more than that this is "working as designed." However it seems a bit excessive to need that much memory for a test.

jlowe avatar Sep 14 '22 13:09 jlowe

I agree. Actually, I don't think we should have this specific gtest. I'm inclined to disable it or remove it altogether.

davidwendt avatar Sep 14 '22 15:09 davidwendt