DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

hi ,when the ZeroQuant inference will be released?

Open xk503775229 opened this issue 1 year ago • 3 comments

Hi,

The engine of ZeroQuant inference is not released yet. The code example in DeepSpeed-Example is only to help verify the accuracy of ZeroQuant.

The kernel/engine released is on our calendar and we are actively working on it to make it compatible for various models. Please stay tuned.

For LKD, we will also release it soon.

For the last question, the code for training or accuracy testing is different than the final inference engine. Here, everything is simulated, so we can do quantization aware training or other things

Originally posted by @yaozhewei in https://github.com/microsoft/DeepSpeed/issues/2207#issuecomment-1212355792

hi ,when the ZeroQuant inference (for GPT model) will be released?

xk503775229 avatar Sep 15 '22 13:09 xk503775229

Any updates on this? Thanks.

david-macleod avatar Oct 20 '22 18:10 david-macleod

Reza wraps up this https://github.com/microsoft/DeepSpeed/pull/2217 which answers some part of your questions, such as the model size reduction. Regarding the kernels, we are working on a plan to release it soon so that you can give it a try. Thanks,

yaozhewei avatar Nov 02 '22 01:11 yaozhewei

Any updates on this? Thanks @yaozhewei

shhn1 avatar Apr 17 '23 09:04 shhn1

Related PRs merged, closing this for now.

loadams avatar Aug 14 '23 20:08 loadams