diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Add INT8 Stable Diffusion through Optimum

Open hshen14 opened this issue 3 years ago • 6 comments

8-bit quantization is useful to improve the inference performance. This PR is to add INT8 quantization for Stable Diffusion through Optimum-Intel quantization API on top of Intel Neural Compressor. The sample code is implemented in Optimum-Intel.

hshen14 avatar Nov 17 '22 08:11 hshen14

@patrickvonplaten please review this one. Thanks.

hshen14 avatar Nov 17 '22 08:11 hshen14

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

cc @echarlaix @michaelbenayoun

anton-l avatar Nov 17 '22 15:11 anton-l

Discussed with @echarlaix offline, seems that the neural-compressor+optimum integration will refactor its API quite soon? Should we hold off the promotion until then?

anton-l avatar Nov 17 '22 16:11 anton-l

Hi @hshen14,

Let's wait for neural-compressor and optimum-intel refactorization before increasing visibility !

echarlaix avatar Nov 17 '22 17:11 echarlaix

Hi @hshen14,

Let's wait for neural-compressor and optimum-intel refactorization before increasing visibility !

Thanks @anton-l @echarlaix. Sure, let's do that.

hshen14 avatar Nov 18 '22 01:11 hshen14

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 24 '22 15:12 github-actions[bot]

Currently, Optimum-Intel was being upgraded with INC v2.0 API. Will re-visit this PR after the upgrade is done.

hshen14 avatar Dec 24 '22 23:12 hshen14

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 19 '23 15:01 github-actions[bot]

bump to keep issue open

Thomas-MMJ avatar Jan 19 '23 19:01 Thomas-MMJ

@echarlaix , do you think it's good time to revisit this? Thanks.

hshen14 avatar Jan 19 '23 23:01 hshen14

Sure, I will work on it and open a PR on diffusers once everything is finalized, does that work for you @hshen14 ?

echarlaix avatar Feb 07 '23 13:02 echarlaix

great job.

CrazyBoyM avatar May 26 '23 07:05 CrazyBoyM

Sure, I will work on it and open a PR on diffusers once everything is finalized, does that work for you @hshen14 ?

That would work perfectly! Thanks @echarlaix

hshen14 avatar May 26 '23 08:05 hshen14

Is int8 quantization still in the works? I would find this extremely helpful on some of the devices I'm trying to use, especially when running on cpu.

Ender436 avatar Feb 18 '24 20:02 Ender436

cc @yiyixuxu @sayakpaul @DN6 here

patrickvonplaten avatar Feb 19 '24 12:02 patrickvonplaten

I think better person to tag here would be @echarlaix.

sayakpaul avatar Feb 19 '24 12:02 sayakpaul