stable-diffusion
stable-diffusion copied to clipboard
CUDA out of memory despite available memory
The following is my hardware makeup:
!nvidia-smi
Tue Nov 15 08:49:04 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 On | 00000000:81:00.0 Off | N/A |
| 44% 32C P8 9W / 125W | 159MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2063 G 63MiB |
| 0 N/A N/A 1849271 C 91MiB |
+-----------------------------------------------------------------------------+
!free -h
total used free shared buff/cache available
Mem: 64G 677M 31G 10M 32G 63G
Swap: 0B 0B 0B
As you can see, I got plenty of CUDA memory and hardly any of it is used. This is the error that I am getting:
Traceback (most recent call last):
File "main.py", line 834, in <module>
raise err
File "main.py", line 816, in <module>
trainer.fit(model, data)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._call_and_handle_interrupt(
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1218, in _run
self.strategy.setup(self)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 162, in setup
self.model_to_device()
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 324, in model_to_device
self.model.to(self.root_device)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/core/mixins/device_dtype_mixin.py", line 121, in to
return super().to(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 927, in to
return self._apply(convert)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
[Previous line repeated 4 more times]
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 602, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.80 GiB total capacity; 6.70 GiB already allocated; 12.44 MiB free; 6.80 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Using the following code reduced the Tried to allocate
section from 146MB to 20MB:
import torch
from GPUtil import showUtilization as gpu_usage
from numba import cuda
def free_gpu_cache():
print("Initial GPU Usage")
gpu_usage()
torch.cuda.empty_cache()
cuda.select_device(0)
cuda.close()
cuda.select_device(0)
print("GPU Usage after emptying the cache")
gpu_usage()
free_gpu_cache()
Still doesn't work. Where am I going wrong?
You don’t have enough GPU RAM always look at the last error:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.80 GiB total capacity; 6.70 GiB already allocated; 12.44 MiB free; 6.80 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
it uses 6,7gb of the ram already so it only has 12,44 mb left.
Maybe you are running to many instances or other Programms at the same time. Or it breaks while it loads the thensors into the GPU. if you have nvtop installed look at the graph when starting the process :)
I have 40GB of CUDA memory. Look at my nvidia-smi output.
This is in a Determined.ai ntebook instance.
Ok, but maybe it is split in some way, look at the error:
GPU 0; 7.80 GiB total capacity
@8bignic8 How can I find the source of the split? This is my nvidia-smi
Tue Nov 15 09:36:39 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 On | 00000000:81:00.0 Off | N/A |
| 40% 30C P8 9W / 125W | 159MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2063 G 63MiB |
| 0 N/A N/A 1849271 C 91MiB |
+-----------------------------------------------------------------------------+
You are having 8192MiB RAM aka 8gb, your system is running 2 Programms as listed below
It could be a driver issue, or you graphics cars only has 8GB of Ram.
@8bignic8 My Graphics card has 82gb of RAM
ok but look:
Tue Nov 15 09:36:39 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 4000 On | 00000000:81:00.0 Off | N/A |
| 40% 30C P8 9W / 125W | 159MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2063 G 63MiB | | 0 N/A N/A 1849271 C 91MiB | +-----------------------------------------------------------------------------+
8192MiB = 8,192gb, if you dont belive me look it up on google.
At the End I just want to help you and the error says that you have not enough GPU Ram. You can isntall the Programm nvtop with apt-get install nvtop and there you see the current load on your gpu like with htop or top. :). Wishing good Luck.
Same issue in Win10 with 12Gb Graphics RAM
RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 12.00 GiB total capacity; 8.62 GiB already allocated; 967.06 MiB free; 8.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Looks like reserved GPU memory by PyTorch ( for cache?, whatever) is not being available for the new allocation. EM says use the PYTORCH_CUDA_ALLOC_CONF environment variable. Does anyone have an example of using this environment variable?
README says at least 10Gib VRAM required - I have 12 Gib VRAM. I set the system environment variable PYTORCH_CUDA_ALLOC_CONF to 100 mb but it was not honoured in the code, code still tried to allocate 1.5Gib. Is anyone having success with the default project on a RTX 3060?
RuntimeError: CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 12.00 GiB total capacity; 8.63 GiB already allocated; 967.06 MiB free; 8.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
(ldm) C:\data\stable\stable-diffusion>echo %PYTORCH_CUDA_ALLOC_CONF% max_split_size_mb:100
OK, Success on an RTX 3060, win10. Thanks to everyone who contributed to the project!! Solution: I reduced the n-samples to 1, ( still got 2 samples ) here is the successful command line python scripts/txt2img.py --prompt "a photograph of an ballet dancer riding a horse" --plms --n_samples 1 Also this video tells how to work with an 8Gb VRAM using --W 448 --H 448 -n_samples 1
https://www.youtube.com/watch?v=z99WBrs1D3g Noted that was the program is very heavy on system RAM and my spinny disk.
nah the same i tried everythin
RuntimeError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 6.00 GiB total capacity; 4.54 GiB already allocated; 0 bytes free; 4.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF (ldm1)
:(
Hi MartinAbilev, just cut down the image size until you get a clean run. Then build it back up to find the sweet spot. e..g. my 12Gb card can run --W 512 -H 512. an 8Gb card can run --W448 --H448. Your 6Gb card might be able to run --W256 --H256, then build it back up. my runtime is about 15 minutes on 8Gb system RAM and a spinny disk. try this python scripts/txt2img.py --prompt "a photograph of an ballet dancer riding a horse" --W 256 --H 256 --plms --n_samples 1
BTW the environment variable does not seem to work on Windows 10. You are allocating in 1Gb chunks, would be nice to reduce this, but I can;t find anywhere PYTORCH_CUDA_ALLOC_CONF is used in the current project.
yeh looks like env vars not taken into acount
To set environment variables, run conda env config vars set my_var=value . Once you have set an environment variable, you have to reactivate your environment: conda activate test-env . To check if the environment variable has been set, run echo $my_var ( echo %my_var% on Windows) or conda env config vars list .
i used W H 64 pixels not work anyway :D
RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 6.00 GiB total capacity; 5.10 GiB already allocated; 0 bytes free; 5.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF (ldm1)
looks like something(torch) reserving all free memory and not care about ya image size...
looks like torch eats approx 6-7gb its error from https://replicate.com/stability-ai/stable-diffusion
CUDA out of memory. Tried to allocate 50.08 GiB (GPU 0; 39.59 GiB total capacity; 5.80 GiB already allocated; 31.83 GiB free; 6.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Apparently the windows driver for NVIDIA GPUs limits them to 85-92% of the total VRAM for 'secondary' GPUs. Ie for laptops and desktops that have both integrated and dedicated GPUs.
See
https://stackoverflow.com/questions/47855185/how-can-i-use-100-of-vram-on-a-secondary-gpu-from-a-single-process-on-windows-1
You might be able to access all of your RAM if you boot into linux (going to try that once I find a USB stick around).
I have the same issue on RTX 4080
how use this - "max_split_size_mb"?
4080????