glid-3-xl-stable icon indicating copy to clipboard operation
glid-3-xl-stable copied to clipboard

how to avoid CUDA out of memory?

Open timotheecour4 opened this issue 2 years ago • 3 comments

All of the training scripts specified in the README give errors like the following:

RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

CUDA Version: 11.7 using 8x Tesla V100-SXM2 (with 16GB memory)

reducing --batch_size 32 didn't help passing --microbatch 1 didn't help

timotheecour4 avatar Oct 19 '22 23:10 timotheecour4

Lowest I managed to get was 32GB with batch size 1... but that was ages ago

chavinlo avatar Oct 20 '22 22:10 chavinlo

Same, I was hoping there would be some flag configuration that would allow using a GPU (or multiple GPUs) with less memory

timotheecour4 avatar Oct 28 '22 19:10 timotheecour4

Same, I was hoping there would be some flag configuration that would allow using a GPU (or multiple GPUs) with less memory

Since it based on pytorch you might be able to implement some optimization like HF's accelerate or deepspeed for RAM offloading

chavinlo avatar Nov 02 '22 02:11 chavinlo