AITemplate icon indicating copy to clipboard operation
AITemplate copied to clipboard

can we get stable diffusion example work with xformers?

Open batrlatom opened this issue 1 year ago • 10 comments

Hi, I was wondering ... is there any easy way to use xformers with the AITemplate in stable diffusion? Since it lowers memory consumption, we can infer with resolution north of 1024x1024. Can we pair it with AITemplate too? reference ... https://github.com/huggingface/diffusers/pull/532

batrlatom avatar Oct 04 '22 08:10 batrlatom

For 512x512, AIT is running at 42it/s vs 27it/s in this PR. So I don't think we need to support xformer. Loop @terrychenism for 1024x1024 generation.

antinucleon avatar Oct 04 '22 15:10 antinucleon

This was not about speed, but about memory consumption. But I get it that this is not a priority

batrlatom avatar Oct 04 '22 15:10 batrlatom

1024x1024 is easy to have, you would need to compile vae model with 128x128 input: https://github.com/facebookincubator/AITemplate/blob/main/examples/05_stable_diffusion/compile.py#L180-L181 For mem we don't support xformet yet, but AIT should be very efficient compared to pytorch, please try 1024x1024 and let us know if you have any issue.

terrychenism avatar Oct 04 '22 17:10 terrychenism

I know, I tried it but received a memory error. I will try it again and let you know

batrlatom avatar Oct 04 '22 17:10 batrlatom

If it is out of memory we may consider to make UNet batch size to 1 and run twice in each step, this will save a lot of memory.

antinucleon avatar Oct 07 '22 02:10 antinucleon

Hi, it looks as there is really OOM problem with higher resolution even on rtx3090.

batrlatom avatar Oct 07 '22 06:10 batrlatom

Are there any constraints in terms of image size? I am trying to make arbitrary resolutions by using unet and vae resolutions = imagesize//8, but it throws me other than OOM errors.

batrlatom avatar Oct 07 '22 07:10 batrlatom

We tested 512x768 for img2img model, it works well. For high res input like 1024x1024, we suggest to build the Unet with batch size 1 and update the pipeline accordingly.

terrychenism avatar Oct 07 '22 08:10 terrychenism

@terrychenism Hi, I tried 1024x1024 and compiled vae and unet model with 128*128 input, encountered a cuda memory error(guess oom, 768x768 is ok) while inference

Env: GPU: A100, 40GB NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 Cuda compilation tools, release 11.3, V11.3.109

Error: pipeline_stable_diffusion_ait.py, line 127, in unet_inference noise_pred = ys[0].permute((0, 3, 1, 2)).float() RuntimeError: CUDA error: an illegal memory access was encountered

zhuoyuan avatar Oct 21 '22 03:10 zhuoyuan

for unet you can split the batch size 2, then run inference twice with batch 1, it should figure out OOM issue. btw the xformer based attn will come soon.

terrychenism avatar Oct 21 '22 04:10 terrychenism

Done: https://github.com/facebookincubator/AITemplate/pull/74

antinucleon avatar Nov 10 '22 00:11 antinucleon