rmm
rmm copied to clipboard
[BUG] Maximum pool size exceeded when using ManagedMemory
Describe the Bug
We use RMM with PyTorch and a managed_memory pool to analyze a simulation trajectory. When iterating the frames of the trajectory, the pool size keeps increasing until it hits an 'out-of-memory' error, specifically out-of-memory: Maximum pool size exceeded
. I configured PyTorch to use RMM as the memory allocator. Our problem size is very large, involving several tensors, each of which is 13GB or more during processing.
Error message
time offset is 2.65 , segment length is 4000
Total frames: 8001, total frames in segment: 4000, frame range: 4000 - 8000
13%|█▎ | 516/4000 [1:50:41<10:11:17, 10.53s/it]Traceback (most recent call last):
File "torch_allocator.pyx", line 15, in rmm._lib.torch_allocator.allocate
MemoryError: std::bad_alloc: out_of_memory: RMM failure at:/blue/program/miniconda3/envs/rapids-23.10/include/rmm/mr/device/pool_memory_resource.hpp:196: Maximum pool size exceeded
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/blue/roitberg/apps/lammps-ani/cumolfind/cumolfind/molfind.py", line 106, in analyze_all_frames
df_formula, df_molecule = analyze_a_frame(
File "/blue/roitberg/apps/lammps-ani/cumolfind/cumolfind/fragment.py", line 237, in analyze_a_frame
cG, df_per_frag = find_fragments(species, positions, cell, pbc, use_cell_list=use_cell_list)
File "/blue/roitberg/apps/lammps-ani/cumolfind/cumolfind/fragment.py", line 188, in find_fragments
atom_index12, distances, _ = neighborlist(species, coordinates, cell=cell, pbc=pbc)
File "/blue/program/miniconda3/envs/rapids-23.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/blue/program/miniconda3/envs/rapids-23.10/lib/python3.10/site-packages/torchani-2.3.dev211+gb682b46c-py3.10-linux-x86_64.egg/torchani/neighbors.py", line 510, in forward
atom_pairs, shift_indices = self._calculate_cell_list(coordinates_displaced.detach(), pbc)
File "/blue/program/miniconda3/envs/rapids-23.10/lib/python3.10/site-packages/torchani-2.3.dev211+gb682b46c-py3.10-linux-x86_64.egg/torchani/neighbors.py", line 595, in _calculate_cell_list
lower, between_pairs_translation_types = self._get_lower_between_image_pairs(neighbor_count,
File "/blue/program/miniconda3/envs/rapids-23.10/lib/python3.10/site-packages/torchani-2.3.dev211+gb682b46c-py3.10-linux-x86_64.egg/torchani/neighbors.py", line 912, in _get_lower_between_image_pairs
-1).repeat(1, 1, 1, padded_atom_neighbors.shape[-1])
SystemError: <method 'repeat' of 'torch._C._TensorBase' objects> returned a result with an exception set
13%|█▎ | 517/4000 [1:50:49<9:23:59, 9.72s/it] Traceback (most recent call last):
File "/blue/roitberg/apps/lammps-ani/cumolfind/cumolfind/molfind.py", line 106, in analyze_all_frames
df_formula, df_molecule = analyze_a_frame(
File "/blue/roitberg/apps/lammps-ani/cumolfind/cumolfind/fragment.py", line 223, in analyze_a_frame
torch.tensor(mdtraj_frame.xyz, device=device).float().view(1, -1, 3) * 10.0
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Steps/Code to reproduce bug
Relevant code snippet
import torch
import rmm
from rmm.allocators.torch import rmm_torch_allocator
# rmm resource logging
rmm.reinitialize(pool_allocator=True, managed_memory=True, maximum_pool_size=300 * 1024 * 1024 * 1024, logging=True, log_file_name="logging_resource.csv")
# Configure PyTorch to use RAPIDS Memory Manager (RMM) for GPU memory management.
torch.cuda.memory.change_current_allocator(rmm_torch_allocator)
A log is recorded using logging_resource_adaptor, attached at: logging_resource.dev0.csv.zip - Google Drive
The memory usage on the CPU is recorded every 10 seconds; the beginning of the run was missing: ram_log.txt
Environment details
Environment was created using
mamba create -n rapids-23.10 -c rapidsai -c conda-forge -c nvidia cudf=23.10 cugraph=23.10 python=3.10 cuda-version=11.8
The analysis was run on an A100 GPU with 81920MiB memory. The environment is also attached: env.txt