AITemplate icon indicating copy to clipboard operation
AITemplate copied to clipboard

Stable Diffusion SD XL support

Open CyberTimon opened this issue 2 years ago • 7 comments
trafficstars

Hello!

It would be super cool to accelerate the stable diffusion xl models, as they are pretty slow because of the 1024x1024 res.

I think it could work pretty easily, the only problem seems to be the second text encoder.

What do you guys think or has anyone successfully compiled the xl models?

CyberTimon avatar Jul 27 '23 11:07 CyberTimon

  • UNet2DConditionModel and UNet blocks need updating for architecture changes and for extra conditionings
  • CLIP needs to output correct hidden state layer, also for bigG, text projection and pooled output
  • Official VAE overflows in fp16, either use community fp16 fixed version or modify VAE to run in fp32
  • Pipeline changes (extra conditioning, concat clip, denoise for refiner etc)

hlky avatar Jul 28 '23 19:07 hlky

@hlky is there any plan to support sdxl? If yes, when will it be finished?

JJHu1993 avatar Jul 31 '23 03:07 JJHu1993

@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.

hlky avatar Jul 31 '23 07:07 hlky

@hlky Anticipating your sdxl changes with great excitement, I believe that the integration of sdxl support will significantly enhance the influence of the AITemplate project.

JJHu1993 avatar Jul 31 '23 12:07 JJHu1993

@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.

Yes it would be great to see these changes. I can't wait to have a faster SDXL. Please share it!

Thank you very much!

CyberTimon avatar Aug 01 '23 08:08 CyberTimon

@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.

thanks a lot, this will very great, i have a question, did AMD RX Vega 56 supported by AITemplate ?

BahzBeih avatar Aug 02 '23 03:08 BahzBeih

Any news? @hlky

CyberTimon avatar Aug 06 '23 19:08 CyberTimon

@CyberTimon I'll do it today.

  • CLIP changes are done, but need testing for numerical accuracy, also, there is a workaround for the limitations of dynamic_slice with regards to pooled output.
  • For UNet the crop conditioning is done outside of UNet, this isn't 100% faithful to Diffusers implementation, but it is slightly faster by ~0.1it/s in my testing (2.1it/s vs 2.2it/s, 3060 12GB, 1024x1024), the crop conditioning is the same every iteration so there's no need to do it inside UNet.
  • VAE is compatible if you use the fp16 fixed version, I'll include changes that pass dtype everywhere so it can be ran in fp32 if desired, and an option that overrides the pipeline's VAE with the fixed version.
  • Bringing over the XL pipeline and modifying it to work with AIT is the main thing at this point. PR will be limited to XL text-to-image pipeline for simplicity.

hlky avatar Aug 07 '23 08:08 hlky

Thank you very much! @hlky

CyberTimon avatar Aug 07 '23 09:08 CyberTimon

is there an update on this? the comments are deleted

aycaecemgul avatar Nov 13 '23 05:11 aycaecemgul