Ilmari Heikkinen
Ilmari Heikkinen
Thanks @patrickvonplaten ! I don't think there's a reference paper / implementation for this, it's based on experimentation. I might be wrong though and maybe there's a paper out there...
[Going on a tangent.] Profiling the memory use a bit further, running the non-tiled decoder with limited memory seems tricky. The decoder images have channel counts ranging from 512 to...
> I have tried to split the vae decoder's upsampling part. I confirm that the seams is from global-aware operators, specifically the attention, most of which can be safely removed...
> > > I have tried to split the vae decoder's upsampling part. I confirm that the seams is from global-aware operators, specifically the attention, most of which can be...
Hi @patrickvonplaten! Yeah, I agree on the `enable_vae_slicing` approach. Here's a small snippet to test: ```python from diffusers import StableDiffusionPipeline import torch import os pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16) pipe.enable_attention_slicing()...
Testing on a 24GB card, the VAE decode time scales linearly, but it runs into an issue with 32 samples. This with the "full batch at a time"-approach. ```bash #...
Doing VAE one image at a time seems to be 15% faster at batch size 8 and 2% slower at batch size 1. It's a small enough difference that it...
Sorry for the late reply, it's been hectic. I added an `enable_vae_slicing()` function to PipelineStableDiffusion and moved the slicing implementation to AutoencoderKL. Let me know if you prefer it in...
> yes, it'd be possible to implement sliced attention too, if that's a limiting factor. Thanks! I tried making the VAE use xformers attention and that did help with memory...
@patrickvonplaten here we go, I added tests and docs. Let me know how they look.