diffusers
diffusers copied to clipboard
[OOM] Memory blows out when trying to upscale images larger than 128x128 using StableDiffusionUpscalePipeline
Describe the bug
When trying to upscale images larger than 128x128 the progress goes to 100% and then crashes with CUDA OOM.
With 512x512 images it's trying to allocate 256.00 GiB!
Reproduction
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")
url = "https://www.freepnglogos.com/uploads/512x512-logo/512x512-transparent-circle-instagram-media-network-social-logo-new-16.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
prompt=""
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
display(upscaled_image)
Logs
RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 14.76 GiB total capacity; 4.77 GiB already allocated; 8.28 GiB free; 5.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
System Info
-
diffusers
version: 0.9.0 - Platform: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.15
- PyTorch version (GPU?): 1.12.1+cu113 (True)
- Huggingface_hub version: 0.11.0
- Transformers version: 4.24.0
I can reproduce this on MPS as well. It didn’t crash but created a ton of swap.
Thank you for reporting this. The reason this happens is that your initial image gets bigger e.g. 512x512
the latent representations end up being 512 (latent dim) x 512 (H) x 512 (W)
in the decoding attention block which gets reshaped into batch = 1
, seq = 512 x 512 (H xW)
and channel = 512
and that obviously will not work as vanilla attention is quadratic mem/compute in the seq length. Thus as you increase your initial image dims, the more mem. it will use.
A solution for now in the above code is to downsize the initial image to something manageable e.g.:
low_res_img = low_res_img.resize((128, 128))
As noted below you can also install xformers or try with attention slicing:
pipeline.enable_attention_slicing()
# pipeline.enable_xformers_memory_efficient_attention()
Thank you for the explanation, I thought this may be the case. Resizing the image to 128x128 would produce a 512x512 image, correct?
@carson-katri Yes correct!
Any way to adapt the attention code to support larger images? The most common use case for this model would be upscaling outputs from SD.
The model may support xformers and attention slicing, which could help I assume.
@carson-katri correct you can try
pipeline.enable_attention_slicing()
and that should reduce some memory in exchange for a small speed decrease and enable larger inputs. With xformers
installed it should be less as you point out!
I'm working on a tile-based solution that runs the upscale model on small, overlapping patches of a larger source image and then merges them back into the full sized result. Much of the code is borrowed from realesrgan upscaler which supports this. Will try and publish code as soon as it's working
I implemented probably the most simplistic form of tiling possible here: https://github.com/carson-katri/dream-textures/blob/aa0132b42dd14ddbf9491c13a7a46a01da2c880a/generator_process/actions/upscale.py
I’m sure there are much better approaches that would limit seams. Perhaps just tiling the latent decoding process? Not entirely sure. Looking forward to seeing the improvements that will be made in this pipeline!
This might be related / interesting: https://github.com/huggingface/diffusers/pull/1454
Did anybody try using xformers
? E.g. see: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion#diffusers.StableDiffusionUpscalePipeline.enable_xformers_memory_efficient_attention
I tried using it with xformers, i believe, and I think I got the same issue... i can re-run it... But the issue occurs in the creating of this empty tensor in the default attention block:
https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L331-L337
you can upscale a 512x image with a ~20GB GPU (I didn´t try with less), with the linked PR & using xformers in the attentions in the VAE (when properly picked up by the enablement, hence another PR). I've this running just fine on a private fork, it looks like all the missing pieces are arriving here (see this PR) else I can PR the required missing bits
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
BTW, we should have better support for upscaling SD-1... once: https://github.com/huggingface/diffusers/pull/1321 is merged :-)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I know this is closed and these things are in docs, but just wanted to say that if you're running into this issue to install the following:
pip install xformers
pip install triton==2.0.0.dev20221120
And to add this to your pipeline:
import torch
from diffusers import DiffusionPipeline
from xformers.ops import MemoryEfficientAttentionFlashAttentionOp
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
# Workaround for not accepting attention shape using VAE for Flash Attention
pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)