AITemplate
AITemplate copied to clipboard
Stable Diffusion SD XL support
Hello!
It would be super cool to accelerate the stable diffusion xl models, as they are pretty slow because of the 1024x1024 res.
I think it could work pretty easily, the only problem seems to be the second text encoder.
What do you guys think or has anyone successfully compiled the xl models?
- UNet2DConditionModel and UNet blocks need updating for architecture changes and for extra conditionings
- CLIP needs to output correct hidden state layer, also for bigG, text projection and pooled output
- Official VAE overflows in fp16, either use community fp16 fixed version or modify VAE to run in fp32
- Pipeline changes (extra conditioning, concat clip, denoise for refiner etc)
@hlky is there any plan to support sdxl? If yes, when will it be finished?
@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.
@hlky Anticipating your sdxl changes with great excitement, I believe that the integration of sdxl support will significantly enhance the influence of the AITemplate project.
@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.
Yes it would be great to see these changes. I can't wait to have a faster SDXL. Please share it!
Thank you very much!
@JJHu1993 I have sdxl support in my private projects. When I have time I will bring those changes here.
thanks a lot, this will very great, i have a question, did AMD RX Vega 56 supported by AITemplate ?
Any news? @hlky
@CyberTimon I'll do it today.
- CLIP changes are done, but need testing for numerical accuracy, also, there is a workaround for the limitations of
dynamic_slicewith regards to pooled output. - For UNet the crop conditioning is done outside of UNet, this isn't 100% faithful to Diffusers implementation, but it is slightly faster by ~0.1it/s in my testing (2.1it/s vs 2.2it/s, 3060 12GB, 1024x1024), the crop conditioning is the same every iteration so there's no need to do it inside UNet.
- VAE is compatible if you use the fp16 fixed version, I'll include changes that pass dtype everywhere so it can be ran in fp32 if desired, and an option that overrides the pipeline's VAE with the fixed version.
- Bringing over the XL pipeline and modifying it to work with AIT is the main thing at this point. PR will be limited to XL text-to-image pipeline for simplicity.
Thank you very much! @hlky
is there an update on this? the comments are deleted