Ilmari Heikkinen comments

Results 28 comments of


                                            Ilmari Heikkinen

8k Stable Diffusion with tiled VAE

Thanks @patrickvonplaten ! I don't think there's a reference paper / implementation for this, it's based on experimentation. I might be wrong though and maybe there's a paper out there...

8k Stable Diffusion with tiled VAE

[Going on a tangent.] Profiling the memory use a bit further, running the non-tiled decoder with limited memory seems tricky. The decoder images have channel counts ranging from 512 to...

8k Stable Diffusion with tiled VAE

> I have tried to split the vae decoder's upsampling part. I confirm that the seams is from global-aware operators, specifically the attention, most of which can be safely removed...

8k Stable Diffusion with tiled VAE

> > > I have tried to split the vae decoder's upsampling part. I confirm that the seams is from global-aware operators, specifically the attention, most of which can be...

StableDiffusion: Decode latents separately to run larger batches

Hi @patrickvonplaten! Yeah, I agree on the `enable_vae_slicing` approach. Here's a small snippet to test: ```python from diffusers import StableDiffusionPipeline import torch import os pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16) pipe.enable_attention_slicing()...

StableDiffusion: Decode latents separately to run larger batches

Testing on a 24GB card, the VAE decode time scales linearly, but it runs into an issue with 32 samples. This with the "full batch at a time"-approach. ```bash #...

StableDiffusion: Decode latents separately to run larger batches

Doing VAE one image at a time seems to be 15% faster at batch size 8 and 2% slower at batch size 1. It's a small enough difference that it...

StableDiffusion: Decode latents separately to run larger batches

Sorry for the late reply, it's been hectic. I added an `enable_vae_slicing()` function to PipelineStableDiffusion and moved the slicing implementation to AutoencoderKL. Let me know if you prefer it in...

StableDiffusion: Decode latents separately to run larger batches

> yes, it'd be possible to implement sliced attention too, if that's a limiting factor. Thanks! I tried making the VAE use xformers attention and that did help with memory...

StableDiffusion: Decode latents separately to run larger batches

@patrickvonplaten here we go, I added tests and docs. Let me know how they look.