stable-diffusion CUDA out of memory despite available memory

The following is my hardware makeup:

!nvidia-smi
Tue Nov 15 08:49:04 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     On   | 00000000:81:00.0 Off |                  N/A |
| 44%   32C    P8     9W / 125W |    159MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2063      G                                      63MiB |
|    0   N/A  N/A   1849271      C                                      91MiB |
+-----------------------------------------------------------------------------+

!free -h
              total        used        free      shared  buff/cache   available
Mem:            64G        677M         31G         10M         32G         63G
Swap:            0B          0B          0B

As you can see, I got plenty of CUDA memory and hardly any of it is used. This is the error that I am getting:

Traceback (most recent call last):
  File "main.py", line 834, in <module>
    raise err
  File "main.py", line 816, in <module>
    trainer.fit(model, data)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
    self._call_and_handle_interrupt(
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
    return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1218, in _run
    self.strategy.setup(self)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 162, in setup
    self.model_to_device()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 324, in model_to_device
    self.model.to(self.root_device)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 121, in to
    return super().to(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 927, in to
    return self._apply(convert)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.80 GiB total capacity; 6.70 GiB already allocated; 12.44 MiB free; 6.80 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Using the following code reduced the Tried to allocate section from 146MB to 20MB:

import torch
from GPUtil import showUtilization as gpu_usage
from numba import cuda

def free_gpu_cache():
    print("Initial GPU Usage")
    gpu_usage()                             

    torch.cuda.empty_cache()

    cuda.select_device(0)
    cuda.close()
    cuda.select_device(0)

    print("GPU Usage after emptying the cache")
    gpu_usage()

free_gpu_cache()

Still doesn't work. Where am I going wrong?

Nov 15 '22 09:11 oo92

You don’t have enough GPU RAM always look at the last error: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.80 GiB total capacity; 6.70 GiB already allocated; 12.44 MiB free; 6.80 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

it uses 6,7gb of the ram already so it only has 12,44 mb left.

Nov 15 '22 09:11 8bignic8

Maybe you are running to many instances or other Programms at the same time. Or it breaks while it loads the thensors into the GPU. if you have nvtop installed look at the graph when starting the process :)

Nov 15 '22 09:11 8bignic8

I have 40GB of CUDA memory. Look at my nvidia-smi output.

This is in a Determined.ai ntebook instance.

Nov 15 '22 09:11 oo92

Ok, but maybe it is split in some way, look at the error: GPU 0; 7.80 GiB total capacity

Nov 15 '22 09:11 8bignic8

@8bignic8 How can I find the source of the split? This is my nvidia-smi

Tue Nov 15 09:36:39 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     On   | 00000000:81:00.0 Off |                  N/A |
| 40%   30C    P8     9W / 125W |    159MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2063      G                                      63MiB |
|    0   N/A  N/A   1849271      C                                      91MiB |
+-----------------------------------------------------------------------------+

Nov 15 '22 10:11 oo92

You are having 8192MiB RAM aka 8gb, your system is running 2 Programms as listed below

Nov 15 '22 11:11 8bignic8

It could be a driver issue, or you graphics cars only has 8GB of Ram.

Nov 15 '22 11:11 8bignic8

@8bignic8 My Graphics card has 82gb of RAM

Nov 15 '22 17:11 oo92

ok but look:

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2063 G 63MiB | | 0 N/A N/A 1849271 C 91MiB | +-----------------------------------------------------------------------------+

8192MiB = 8,192gb, if you dont belive me look it up on google.

At the End I just want to help you and the error says that you have not enough GPU Ram. You can isntall the Programm nvtop with apt-get install nvtop and there you see the current load on your gpu like with htop or top. :). Wishing good Luck.

Nov 15 '22 18:11 8bignic8

Same issue in Win10 with 12Gb Graphics RAM

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 12.00 GiB total capacity; 8.62 GiB already allocated; 967.06 MiB free; 8.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Looks like reserved GPU memory by PyTorch ( for cache?, whatever) is not being available for the new allocation. EM says use the PYTORCH_CUDA_ALLOC_CONF environment variable. Does anyone have an example of using this environment variable?

Nov 16 '22 05:11 GLKSA

README says at least 10Gib VRAM required - I have 12 Gib VRAM. I set the system environment variable PYTORCH_CUDA_ALLOC_CONF to 100 mb but it was not honoured in the code, code still tried to allocate 1.5Gib. Is anyone having success with the default project on a RTX 3060?

RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 12.00 GiB total capacity; 8.63 GiB already allocated; 967.06 MiB free; 8.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

(ldm) C:\data\stable\stable-diffusion>echo %PYTORCH_CUDA_ALLOC_CONF% max_split_size_mb:100

Nov 16 '22 06:11 GLKSA

OK, Success on an RTX 3060, win10. Thanks to everyone who contributed to the project!! Solution: I reduced the n-samples to 1, ( still got 2 samples ) here is the successful command line python scripts/txt2img.py --prompt "a photograph of an ballet dancer riding a horse" --plms --n_samples 1 Also this video tells how to work with an 8Gb VRAM using --W 448 --H 448 -n_samples 1

Nov 16 '22 07:11 GLKSA

https://www.youtube.com/watch?v=z99WBrs1D3g Noted that was the program is very heavy on system RAM and my spinny disk.

Nov 16 '22 07:11 GLKSA

nah the same i tried everythin

RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 4.54 GiB already allocated; 0 bytes free; 4.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF (ldm1)

:(

Nov 20 '22 16:11 MartinAbilev

Hi MartinAbilev, just cut down the image size until you get a clean run. Then build it back up to find the sweet spot. e..g. my 12Gb card can run --W 512 -H 512. an 8Gb card can run --W448 --H448. Your 6Gb card might be able to run --W256 --H256, then build it back up. my runtime is about 15 minutes on 8Gb system RAM and a spinny disk. try this python scripts/txt2img.py --prompt "a photograph of an ballet dancer riding a horse" --W 256 --H 256 --plms --n_samples 1

Nov 20 '22 22:11 GLKSA

BTW the environment variable does not seem to work on Windows 10. You are allocating in 1Gb chunks, would be nice to reduce this, but I can;t find anywhere PYTORCH_CUDA_ALLOC_CONF is used in the current project.

Nov 20 '22 22:11 GLKSA

yeh looks like env vars not taken into acount

To set environment variables, run conda env config vars set my_var=value . Once you have set an environment variable, you have to reactivate your environment: conda activate test-env . To check if the environment variable has been set, run echo $my_var ( echo %my_var% on Windows) or conda env config vars list .

i used W H 64 pixels not work anyway :D

RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 6.00 GiB total capacity; 5.10 GiB already allocated; 0 bytes free; 5.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF (ldm1)

looks like something(torch) reserving all free memory and not care about ya image size...

Nov 21 '22 07:11 MartinAbilev

looks like torch eats approx 6-7gb its error from https://replicate.com/stability-ai/stable-diffusion

CUDA out of memory. Tried to allocate 50.08 GiB (GPU 0; 39.59 GiB total capacity; 5.80 GiB already allocated; 31.83 GiB free; 6.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Nov 21 '22 11:11 MartinAbilev

Apparently the windows driver for NVIDIA GPUs limits them to 85-92% of the total VRAM for 'secondary' GPUs. Ie for laptops and desktops that have both integrated and dedicated GPUs.

See

https://stackoverflow.com/questions/47855185/how-can-i-use-100-of-vram-on-a-secondary-gpu-from-a-single-process-on-windows-1

You might be able to access all of your RAM if you boot into linux (going to try that once I find a USB stick around).