i-Code icon indicating copy to clipboard operation
i-Code copied to clipboard

CoDi : CUDA ran out of memory while trying to do inference tasks

Open PHOENIXFURY007 opened this issue 1 year ago • 2 comments

I was trying to run the demo notebook on Nvidia A100 80 GB. While trying to load the model from checkpoint, I am facing this issue: ####################### Running in eps mode #######################

making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Load pretrained weight from ['CoDi_encoders.pth', 'CoDi_text_diffuser.pth', 'CoDi_audio_diffuser_m.pth', 'CoDi_video_diffuser_8frames.pth']

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 23.70 GiB total capacity; 17.10 GiB already allocated; 3.56 MiB free; 17.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Can you let me know how to solve this issue ?

I checked with nvidia-smi to see if there were any other running processes, but there was nothing . +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100 80G... On | 00000000:9E:00.0 Off | 0 | | N/A 33C P0 46W / 300W | 0MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

PHOENIXFURY007 avatar Jul 07 '23 09:07 PHOENIXFURY007

I was able to load the checkpoints , but as I tried to do Text to Video +Audio , it shows the same problem as before. RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 23.70 GiB total capacity; 21.25 GiB already allocated; 416.56 MiB free; 21.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Any way I can run all of the inference tasks on a single A100 80 GB ?

PHOENIXFURY007 avatar Jul 08 '23 09:07 PHOENIXFURY007

Did you manage to get it working?

aajinkya1203 avatar Aug 03 '24 14:08 aajinkya1203