Add INT8 Stable Diffusion through Optimum
8-bit quantization is useful to improve the inference performance. This PR is to add INT8 quantization for Stable Diffusion through Optimum-Intel quantization API on top of Intel Neural Compressor. The sample code is implemented in Optimum-Intel.
@patrickvonplaten please review this one. Thanks.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
cc @echarlaix @michaelbenayoun
Discussed with @echarlaix offline, seems that the neural-compressor+optimum integration will refactor its API quite soon? Should we hold off the promotion until then?
Hi @hshen14,
Let's wait for neural-compressor and optimum-intel refactorization before increasing visibility !
Hi @hshen14,
Let's wait for
neural-compressorandoptimum-intelrefactorization before increasing visibility !
Thanks @anton-l @echarlaix. Sure, let's do that.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Currently, Optimum-Intel was being upgraded with INC v2.0 API. Will re-visit this PR after the upgrade is done.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
bump to keep issue open
@echarlaix , do you think it's good time to revisit this? Thanks.
Sure, I will work on it and open a PR on diffusers once everything is finalized, does that work for you @hshen14 ?
great job.
Sure, I will work on it and open a PR on
diffusersonce everything is finalized, does that work for you @hshen14 ?
That would work perfectly! Thanks @echarlaix
Is int8 quantization still in the works? I would find this extremely helpful on some of the devices I'm trying to use, especially when running on cpu.
cc @yiyixuxu @sayakpaul @DN6 here
I think better person to tag here would be @echarlaix.