AITemplate Stable Diffusion SD XL support

trafficstars

Hello!

It would be super cool to accelerate the stable diffusion xl models, as they are pretty slow because of the 1024x1024 res.

I think it could work pretty easily, the only problem seems to be the second text encoder.

What do you guys think or has anyone successfully compiled the xl models?

Jul 27 '23 11:07 CyberTimon

UNet2DConditionModel and UNet blocks need updating for architecture changes and for extra conditionings
CLIP needs to output correct hidden state layer, also for bigG, text projection and pooled output
Official VAE overflows in fp16, either use community fp16 fixed version or modify VAE to run in fp32
Pipeline changes (extra conditioning, concat clip, denoise for refiner etc)

Jul 28 '23 19:07 hlky

@hlky is there any plan to support sdxl? If yes, when will it be finished?

Jul 31 '23 03:07 JJHu1993

@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.

Jul 31 '23 07:07 hlky

@hlky Anticipating your sdxl changes with great excitement, I believe that the integration of sdxl support will significantly enhance the influence of the AITemplate project.

Jul 31 '23 12:07 JJHu1993

@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.

Yes it would be great to see these changes. I can't wait to have a faster SDXL. Please share it!

Thank you very much!

Aug 01 '23 08:08 CyberTimon

@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.

thanks a lot, this will very great, i have a question, did AMD RX Vega 56 supported by AITemplate ?

Aug 02 '23 03:08 BahzBeih

Any news? @hlky

Aug 06 '23 19:08 CyberTimon

@CyberTimon I'll do it today.

CLIP changes are done, but need testing for numerical accuracy, also, there is a workaround for the limitations of dynamic_slice with regards to pooled output.
For UNet the crop conditioning is done outside of UNet, this isn't 100% faithful to Diffusers implementation, but it is slightly faster by ~0.1it/s in my testing (2.1it/s vs 2.2it/s, 3060 12GB, 1024x1024), the crop conditioning is the same every iteration so there's no need to do it inside UNet.
VAE is compatible if you use the fp16 fixed version, I'll include changes that pass dtype everywhere so it can be ran in fp32 if desired, and an option that overrides the pipeline's VAE with the fixed version.
Bringing over the XL pipeline and modifying it to work with AIT is the main thing at this point. PR will be limited to XL text-to-image pipeline for simplicity.

Aug 07 '23 08:08 hlky

Thank you very much! @hlky

Aug 07 '23 09:08 CyberTimon

is there an update on this? the comments are deleted

Nov 13 '23 05:11 aycaecemgul

AITemplate AITemplate copied to clipboard

Stable Diffusion SD XL support

AITemplate
AITemplate copied to clipboard