Ilmari Heikkinen

Results 28 comments of Ilmari Heikkinen

Thanks @patrickvonplaten ! I don't think there's a reference paper / implementation for this, it's based on experimentation. I might be wrong though and maybe there's a paper out there...

[Going on a tangent.] Profiling the memory use a bit further, running the non-tiled decoder with limited memory seems tricky. The decoder images have channel counts ranging from 512 to...

> I have tried to split the vae decoder's upsampling part. I confirm that the seams is from global-aware operators, specifically the attention, most of which can be safely removed...

> > > I have tried to split the vae decoder's upsampling part. I confirm that the seams is from global-aware operators, specifically the attention, most of which can be...

Hi @patrickvonplaten! Yeah, I agree on the `enable_vae_slicing` approach. Here's a small snippet to test: ```python from diffusers import StableDiffusionPipeline import torch import os pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16) pipe.enable_attention_slicing()...

Testing on a 24GB card, the VAE decode time scales linearly, but it runs into an issue with 32 samples. This with the "full batch at a time"-approach. ```bash #...

Doing VAE one image at a time seems to be 15% faster at batch size 8 and 2% slower at batch size 1. It's a small enough difference that it...

Sorry for the late reply, it's been hectic. I added an `enable_vae_slicing()` function to PipelineStableDiffusion and moved the slicing implementation to AutoencoderKL. Let me know if you prefer it in...

> yes, it'd be possible to implement sliced attention too, if that's a limiting factor. Thanks! I tried making the VAE use xformers attention and that did help with memory...

@patrickvonplaten here we go, I added tests and docs. Let me know how they look.