AITemplate can we get stable diffusion example work with xformers?

Hi, I was wondering ... is there any easy way to use xformers with the AITemplate in stable diffusion? Since it lowers memory consumption, we can infer with resolution north of 1024x1024. Can we pair it with AITemplate too? reference ... https://github.com/huggingface/diffusers/pull/532

Oct 04 '22 08:10 batrlatom

For 512x512, AIT is running at 42it/s vs 27it/s in this PR. So I don't think we need to support xformer. Loop @terrychenism for 1024x1024 generation.

Oct 04 '22 15:10 antinucleon

This was not about speed, but about memory consumption. But I get it that this is not a priority

Oct 04 '22 15:10 batrlatom

1024x1024 is easy to have, you would need to compile vae model with 128x128 input： https://github.com/facebookincubator/AITemplate/blob/main/examples/05_stable_diffusion/compile.py#L180-L181 For mem we don't support xformet yet, but AIT should be very efficient compared to pytorch, please try 1024x1024 and let us know if you have any issue.

Oct 04 '22 17:10 terrychenism

I know, I tried it but received a memory error. I will try it again and let you know

Oct 04 '22 17:10 batrlatom

If it is out of memory we may consider to make UNet batch size to 1 and run twice in each step, this will save a lot of memory.

Oct 07 '22 02:10 antinucleon

Hi, it looks as there is really OOM problem with higher resolution even on rtx3090.

Oct 07 '22 06:10 batrlatom

Are there any constraints in terms of image size? I am trying to make arbitrary resolutions by using unet and vae resolutions = imagesize//8, but it throws me other than OOM errors.

Oct 07 '22 07:10 batrlatom

We tested 512x768 for img2img model, it works well. For high res input like 1024x1024, we suggest to build the Unet with batch size 1 and update the pipeline accordingly.

Oct 07 '22 08:10 terrychenism

@terrychenism Hi, I tried 1024x1024 and compiled vae and unet model with 128*128 input, encountered a cuda memory error(guess oom, 768x768 is ok) while inference

Env: GPU: A100, 40GB NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 Cuda compilation tools, release 11.3, V11.3.109

Error: pipeline_stable_diffusion_ait.py, line 127, in unet_inference noise_pred = ys[0].permute((0, 3, 1, 2)).float() RuntimeError: CUDA error: an illegal memory access was encountered

Oct 21 '22 03:10 zhuoyuan

for unet you can split the batch size 2, then run inference twice with batch 1, it should figure out OOM issue. btw the xformer based attn will come soon.

Oct 21 '22 04:10 terrychenism

Done: https://github.com/facebookincubator/AITemplate/pull/74

Nov 10 '22 00:11 antinucleon

AITemplate AITemplate copied to clipboard

can we get stable diffusion example work with xformers?

AITemplate
AITemplate copied to clipboard