glid-3-xl-stable
glid-3-xl-stable copied to clipboard
how to avoid CUDA out of memory?
All of the training scripts specified in the README give errors like the following:
RuntimeError: CUDA out of memory. Tried to allocate 4.00 GiB (GPU 0; 15.78 GiB total capacity; 10.52 GiB already allocated; 3.86 GiB free; 10.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
CUDA Version: 11.7 using 8x Tesla V100-SXM2 (with 16GB memory)
reducing --batch_size 32
didn't help
passing --microbatch 1
didn't help
Lowest I managed to get was 32GB with batch size 1... but that was ages ago
Same, I was hoping there would be some flag configuration that would allow using a GPU (or multiple GPUs) with less memory
Same, I was hoping there would be some flag configuration that would allow using a GPU (or multiple GPUs) with less memory
Since it based on pytorch you might be able to implement some optimization like HF's accelerate or deepspeed for RAM offloading