diffusers [OOM] Memory blows out when trying to upscale images larger than 128x128 using StableDiffusionUpscalePipeline

Describe the bug

When trying to upscale images larger than 128x128 the progress goes to 100% and then crashes with CUDA OOM.

With 512x512 images it's trying to allocate 256.00 GiB!

Reproduction

import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch

model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")

url = "https://www.freepnglogos.com/uploads/512x512-logo/512x512-transparent-circle-instagram-media-network-social-logo-new-16.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
prompt=""
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
display(upscaled_image)

Logs

RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 14.76 GiB total capacity; 4.77 GiB already allocated; 8.28 GiB free; 5.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

System Info

diffusers version: 0.9.0
Platform: Linux-5.10.133+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.15
PyTorch version (GPU?): 1.12.1+cu113 (True)
Huggingface_hub version: 0.11.0
Transformers version: 4.24.0

Nov 26 '22 10:11 qunash

I can reproduce this on MPS as well. It didn’t crash but created a ton of swap.

Nov 26 '22 20:11 carson-katri

Thank you for reporting this. The reason this happens is that your initial image gets bigger e.g. 512x512 the latent representations end up being 512 (latent dim) x 512 (H) x 512 (W) in the decoding attention block which gets reshaped into batch = 1, seq = 512 x 512 (H xW) and channel = 512 and that obviously will not work as vanilla attention is quadratic mem/compute in the seq length. Thus as you increase your initial image dims, the more mem. it will use.

A solution for now in the above code is to downsize the initial image to something manageable e.g.:

low_res_img = low_res_img.resize((128, 128))

As noted below you can also install xformers or try with attention slicing:

pipeline.enable_attention_slicing()
# pipeline.enable_xformers_memory_efficient_attention()

Nov 26 '22 20:11 kashif

Thank you for the explanation, I thought this may be the case. Resizing the image to 128x128 would produce a 512x512 image, correct?

Nov 26 '22 21:11 carson-katri

@carson-katri Yes correct!

Nov 26 '22 21:11 kashif

Any way to adapt the attention code to support larger images? The most common use case for this model would be upscaling outputs from SD.

Nov 26 '22 22:11 monophthongal

The model may support xformers and attention slicing, which could help I assume.

Nov 26 '22 22:11 carson-katri

@carson-katri correct you can try

pipeline.enable_attention_slicing()

and that should reduce some memory in exchange for a small speed decrease and enable larger inputs. With xformers installed it should be less as you point out!

Nov 26 '22 22:11 kashif

I'm working on a tile-based solution that runs the upscale model on small, overlapping patches of a larger source image and then merges them back into the full sized result. Much of the code is borrowed from realesrgan upscaler which supports this. Will try and publish code as soon as it's working

Nov 27 '22 15:11 un1tz3r0

I implemented probably the most simplistic form of tiling possible here: https://github.com/carson-katri/dream-textures/blob/aa0132b42dd14ddbf9491c13a7a46a01da2c880a/generator_process/actions/upscale.py

I’m sure there are much better approaches that would limit seams. Perhaps just tiling the latent decoding process? Not entirely sure. Looking forward to seeing the improvements that will be made in this pipeline!

Nov 28 '22 03:11 carson-katri

This might be related / interesting: https://github.com/huggingface/diffusers/pull/1454

Nov 30 '22 12:11 patrickvonplaten

Did anybody try using xformers? E.g. see: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion#diffusers.StableDiffusionUpscalePipeline.enable_xformers_memory_efficient_attention

Nov 30 '22 13:11 patrickvonplaten

I tried using it with xformers, i believe, and I think I got the same issue... i can re-run it... But the issue occurs in the creating of this empty tensor in the default attention block:

https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L331-L337

Nov 30 '22 13:11 kashif

you can upscale a 512x image with a ~20GB GPU (I didn´t try with less), with the linked PR & using xformers in the attentions in the VAE (when properly picked up by the enablement, hence another PR). I've this running just fine on a private fork, it looks like all the missing pieces are arriving here (see this PR) else I can PR the required missing bits

Dec 01 '22 20:12 blefaudeux

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Dec 26 '22 15:12 github-actions[bot]

BTW, we should have better support for upscaling SD-1... once: https://github.com/huggingface/diffusers/pull/1321 is merged :-)

Jan 03 '23 12:01 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 27 '23 15:01 github-actions[bot]

I know this is closed and these things are in docs, but just wanted to say that if you're running into this issue to install the following:

pip install xformers
pip install triton==2.0.0.dev20221120

And to add this to your pipeline:

import torch
from diffusers import DiffusionPipeline
from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
# Workaround for not accepting attention shape using VAE for Flash Attention
pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)

Feb 08 '23 23:02 eolszewski

diffusers diffusers copied to clipboard

[OOM] Memory blows out when trying to upscale images larger than 128x128 using StableDiffusionUpscalePipeline

Describe the bug

Reproduction

Logs

System Info

diffusers
diffusers copied to clipboard