Zhewei Yao

Results 13 comments of Zhewei Yao

Hi, We recently refactored the MoQ part (version >=0.7.0 for Deepspeed). Please try the newest version and let us know if that works. Here is the new tutorial link: https://www.deepspeed.ai/tutorials/model-compression/

Hi Xuezhe, Please let me know if the version in our branch can solve your problem.

Hi This will be released as a part of (MII-Azure) later: https://github.com/microsoft/DeepSpeed-MII

Hi, The engine of ZeroQuant inference is not released yet. The code example in DeepSpeed-Example is only to help verify the accuracy of ZeroQuant. The kernel/engine released is on our...

@david-macleod LKD example is just released (not merged yet): https://github.com/microsoft/DeepSpeedExamples/pull/214 For kernel, please stay tuned

Reza wraps up this https://github.com/microsoft/DeepSpeed/pull/2217 which answers some part of your questions, such as the model size reduction. Regarding the kernels, we are working on a plan to release it...

Has there been any updates on this feature?

Thanks for the great proposal and we appreciate your contributation here :). We will discuss it internally and get back to you soon. Best,

Hi there, The proposal looks great to us. For the pruning/scarification proposal, we wonder if the return of callback is needed or we can just use something like `deepspeed.sparse_callback_step() `...

@ftian1 We did not provide a calibration based PTQ but provided ZeroQuant (PTQ without calibration) example here: https://github.com/microsoft/DeepSpeedExamples/tree/master/model_compression/bert/bash_script/ZeroQuant. It is always good to have more examples and we appreciate if...