diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

In-painting & Img-to-Img seemingly do not abide `enable_attention_slicing()`

Open rschristian opened this issue 3 years ago • 2 comments

Describe the bug

I'm able to run (seemingly) all text-to-image examples without any sort of issue, including the one in the ReadMe. Memory gets a bit tight, but stays under my 8gb limit.

However, when I switch to trying out the in-painting example, I immediately hit an out of memory error. Reading the docs, pipe.enable_attention_slicing() should help this but it doesn't seem to have any noticeable effect for this pipeline. Image-to-image seemingly suffers the same issue. It does work well for text-to-image, however, keeps me well below the limit.

Apologies if that is not meant to have any effect for pipelines besides those created from StableDiffusionPipeline. It's a bit unclear to me whether those instructions for reducing are meant solely for text-to-image or the other pipelines as well.

Reproduction

import torch
from PIL import Image

from diffusers import StableDiffusionInpaintPipeline

img = Image.open("<filepath>").convert("RGB")
mask = Image.open("<filepath>").convert("RGB")

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True,
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()

prompt = "A starry sky"
with torch.autocast("cuda"):
    image = pipe(prompt=prompt, init_image=img, mask_image=mask, strength=0.75, guidance_scale=7.5).images[0]
    image.save("example.png")

The image and mask are 960x960, if that's relevant.

Logs

(env) PS C:\Users\Ryan\Projects\Diffusion> py foo.py
Fetching 16 files: 100%|████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 10658.97it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "C:\Users\Ryan\Projects\Diffusion\foo.py", line 24, in <module>
    image = pipe(prompt=prompt, init_image=img, mask_image=mask, strength=0.75, guidance_scale=7.5).images[0]
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_inpaint.py", line 361, in __call__
    noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\unet_2d_condition.py", line 283, in forward
    sample, res_samples = downsample_block(
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\unet_blocks.py", line 565, in forward
    hidden_states = attn(hidden_states, context=encoder_hidden_states)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\attention.py", line 149, in forward
    hidden_states = block(hidden_states, context=context)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\attention.py", line 198, in forward
    hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\attention.py", line 272, in forward
    hidden_states = self._sliced_attention(query, key, value, sequence_length, dim)
  File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\attention.py", line 298, in _sliced_attention
    attn_slice = attn_slice.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 4.53 GiB (GPU 0; 8.00 GiB total capacity; 4.49 GiB already allocated; 1.06 GiB free; 4.59 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is this log implying that 2.35 GiB is used elsewhere (8 GiB total - 1.06 GiB free - 4.59 GiB reserved)? That seems incorrect, nvidia-smi shows ~200 MiB used passively.



### System Info

- `diffusers` version: 0.4.1
- Platform: Windows-10-10.0.19044-SP0
- Python version: 3.10.7
- PyTorch version (GPU?): 1.12.1+cu116 (True)
- Huggingface_hub version: 0.10.0
- Transformers version: 4.23.0
- Using GPU in script?:  Yes
  - GPU: 3070 Ti, 8 GiB
- Using distributed or parallel set-up in script?: no

rschristian avatar Oct 11 '22 04:10 rschristian

Hey @rschristian,

Note that img2img and in-paint do require a bit more memory because we are dealing with both a random noise image input and a init_image input. It doesn't suprise me very much that img2img requires slightly more memory than standdard stable-diffusion.

In my tests enable_attention_slicing() works just fine. @rschristian when running:

import torch
from PIL import Image

from diffusers import StableDiffusionInpaintPipeline

img = Image.open("<filepath>").convert("RGB")
mask = Image.open("<filepath>").convert("RGB")

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True,
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()

prompt = "A starry sky"
with torch.autocast("cuda"):
    image = pipe(prompt=prompt, init_image=img, mask_image=mask, strength=0.75, guidance_scale=7.5).images[0]
    image.save("example.png")

vs.

import torch
from PIL import Image

from diffusers import StableDiffusionInpaintPipeline

img = Image.open("<filepath>").convert("RGB")
mask = Image.open("<filepath>").convert("RGB")

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="fp16", 
    torch_dtype=torch.float16,
    use_auth_token=True,
)
pipe = pipe.to("cuda")

prompt = "A starry sky"
with torch.autocast("cuda"):
    image = pipe(prompt=prompt, init_image=img, mask_image=mask, strength=0.75, guidance_scale=7.5).images[0]
    image.save("example.png")

You should see a difference in max used GPU memory - is this the case for you or not?

patrickvonplaten avatar Oct 11 '22 18:10 patrickvonplaten

Yep, makes perfect sense that it'll use more memory. I had assumed as much and have no issue with that.

You should see a difference in max used GPU memory - is this the case for you or not?

From my (limited) view, there's no discernible difference. I run out of memory instantly either way.

If pipe.enable_attention_slicing() is capping usage, but not below 8 GiB, then there's no way for me to tell whether it does anything at all at the moment.

I'm happy to close this out if that is the case, just can't really tell is all. Using pipe.enable_attention_slicing() w/ text-to-image keeps me at around 4.5 GiB (~7.5 w/out), so I'd think, even with the extra overhead of img2img/in-painting I'd stay below 8 GiB, but perhaps that's a poor assumption.

rschristian avatar Oct 11 '22 18:10 rschristian

On a quest of trial and error, I decided to cut my images down from 960x960 to 480x480 which succeeded while confirming that pipe.enable_attention_slicing() is in fact working. I hit ~5.4 GiB with it enabled, ~7 w/out.

With attention slicing enabled, I found the max image & mask sizes I can handle is 840x840, or there abouts, and that's cutting it pretty close.

The additional overhead of img2img/in-paint was just much more than I was expecting, requiring reduced image resolutions. Therefore closing this out as its invalid and not an issue.

Thanks!

rschristian avatar Oct 12 '22 01:10 rschristian