diffusers
diffusers copied to clipboard
In-painting & Img-to-Img seemingly do not abide `enable_attention_slicing()`
Describe the bug
I'm able to run (seemingly) all text-to-image examples without any sort of issue, including the one in the ReadMe. Memory gets a bit tight, but stays under my 8gb limit.
However, when I switch to trying out the in-painting example, I immediately hit an out of memory error. Reading the docs, pipe.enable_attention_slicing() should help this but it doesn't seem to have any noticeable effect for this pipeline. Image-to-image seemingly suffers the same issue. It does work well for text-to-image, however, keeps me well below the limit.
Apologies if that is not meant to have any effect for pipelines besides those created from StableDiffusionPipeline. It's a bit unclear to me whether those instructions for reducing are meant solely for text-to-image or the other pipelines as well.
Reproduction
import torch
from PIL import Image
from diffusers import StableDiffusionInpaintPipeline
img = Image.open("<filepath>").convert("RGB")
mask = Image.open("<filepath>").convert("RGB")
pipe = StableDiffusionInpaintPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=True,
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()
prompt = "A starry sky"
with torch.autocast("cuda"):
image = pipe(prompt=prompt, init_image=img, mask_image=mask, strength=0.75, guidance_scale=7.5).images[0]
image.save("example.png")
The image and mask are 960x960, if that's relevant.
Logs
(env) PS C:\Users\Ryan\Projects\Diffusion> py foo.py
Fetching 16 files: 100%|████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 10658.97it/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
0it [00:00, ?it/s]
Traceback (most recent call last):
File "C:\Users\Ryan\Projects\Diffusion\foo.py", line 24, in <module>
image = pipe(prompt=prompt, init_image=img, mask_image=mask, strength=0.75, guidance_scale=7.5).images[0]
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_inpaint.py", line 361, in __call__
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\unet_2d_condition.py", line 283, in forward
sample, res_samples = downsample_block(
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\unet_blocks.py", line 565, in forward
hidden_states = attn(hidden_states, context=encoder_hidden_states)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\attention.py", line 149, in forward
hidden_states = block(hidden_states, context=context)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\attention.py", line 198, in forward
hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\attention.py", line 272, in forward
hidden_states = self._sliced_attention(query, key, value, sequence_length, dim)
File "C:\Users\Ryan\Projects\Diffusion\env\lib\site-packages\diffusers\models\attention.py", line 298, in _sliced_attention
attn_slice = attn_slice.softmax(dim=-1)
RuntimeError: CUDA out of memory. Tried to allocate 4.53 GiB (GPU 0; 8.00 GiB total capacity; 4.49 GiB already allocated; 1.06 GiB free; 4.59 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is this log implying that 2.35 GiB is used elsewhere (8 GiB total - 1.06 GiB free - 4.59 GiB reserved)? That seems incorrect, nvidia-smi shows ~200 MiB used passively.
### System Info
- `diffusers` version: 0.4.1
- Platform: Windows-10-10.0.19044-SP0
- Python version: 3.10.7
- PyTorch version (GPU?): 1.12.1+cu116 (True)
- Huggingface_hub version: 0.10.0
- Transformers version: 4.23.0
- Using GPU in script?: Yes
- GPU: 3070 Ti, 8 GiB
- Using distributed or parallel set-up in script?: no
Hey @rschristian,
Note that img2img and in-paint do require a bit more memory because we are dealing with both a random noise image input and a init_image input. It doesn't suprise me very much that img2img requires slightly more memory than standdard stable-diffusion.
In my tests enable_attention_slicing() works just fine. @rschristian when running:
import torch
from PIL import Image
from diffusers import StableDiffusionInpaintPipeline
img = Image.open("<filepath>").convert("RGB")
mask = Image.open("<filepath>").convert("RGB")
pipe = StableDiffusionInpaintPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=True,
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()
prompt = "A starry sky"
with torch.autocast("cuda"):
image = pipe(prompt=prompt, init_image=img, mask_image=mask, strength=0.75, guidance_scale=7.5).images[0]
image.save("example.png")
vs.
import torch
from PIL import Image
from diffusers import StableDiffusionInpaintPipeline
img = Image.open("<filepath>").convert("RGB")
mask = Image.open("<filepath>").convert("RGB")
pipe = StableDiffusionInpaintPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="fp16",
torch_dtype=torch.float16,
use_auth_token=True,
)
pipe = pipe.to("cuda")
prompt = "A starry sky"
with torch.autocast("cuda"):
image = pipe(prompt=prompt, init_image=img, mask_image=mask, strength=0.75, guidance_scale=7.5).images[0]
image.save("example.png")
You should see a difference in max used GPU memory - is this the case for you or not?
Yep, makes perfect sense that it'll use more memory. I had assumed as much and have no issue with that.
You should see a difference in max used GPU memory - is this the case for you or not?
From my (limited) view, there's no discernible difference. I run out of memory instantly either way.
If pipe.enable_attention_slicing() is capping usage, but not below 8 GiB, then there's no way for me to tell whether it does anything at all at the moment.
I'm happy to close this out if that is the case, just can't really tell is all. Using pipe.enable_attention_slicing() w/ text-to-image keeps me at around 4.5 GiB (~7.5 w/out), so I'd think, even with the extra overhead of img2img/in-painting I'd stay below 8 GiB, but perhaps that's a poor assumption.
On a quest of trial and error, I decided to cut my images down from 960x960 to 480x480 which succeeded while confirming that pipe.enable_attention_slicing() is in fact working. I hit ~5.4 GiB with it enabled, ~7 w/out.
With attention slicing enabled, I found the max image & mask sizes I can handle is 840x840, or there abouts, and that's cutting it pretty close.
The additional overhead of img2img/in-paint was just much more than I was expecting, requiring reduced image resolutions. Therefore closing this out as its invalid and not an issue.
Thanks!