raft icon indicating copy to clipboard operation
raft copied to clipboard

[BUG] run wiki_all_88m on NV A100 with raft-ann-bench will crash

Open ftian1 opened this issue 11 months ago • 1 comments

Describe the bug it will raise below error on NV A100 GPU.

raft_cagra.graph_degree32.intermediate_graph_degree32.graph_build_algoNN_DESCENT/process_time/real_time ERROR OCCURRED: 'Failed to create an algo: std::bad_alloc: out_of_memory: RMM failure at:/sparse/miniconda3/envs/py310/include/rmm/mr/device/pool_memory_resource.hpp:313: Maximum pool size exceeded'

Steps/Code to reproduce bug

python -m raft-ann-bench.run --dataset wiki_all_88M --dataset-path ./ --algorithms raft_cagra --build

Expected behavior run benchmark succeed

Environment details (please complete the following information): Bare-metal installation on Ubuntu Raft was installed by conda install -c rapidsai -c conda-forge raft-ann-bench-gpu

ftian1 avatar Feb 28 '24 06:02 ftian1

I saw this error when I used conda install. And when I turn to use docker container: https://docs.rapids.ai/api/raft/stable/raft_ann_benchmarks/#docker , the issue disappears.

Slyne avatar Apr 10 '24 21:04 Slyne