DeepSpeedExamples
DeepSpeedExamples copied to clipboard
The model size does not change
When I follow this https://www.deepspeed.ai/tutorials/model-compression/#2-tutorial-for-zeroquant-efficient-and-affordable-post-training-quantization run the zero_quant.sh or (quant_activation.sh and quant_weight.sh), the model size still is 418mb as the bert-base.
the clean_model weight still save as float32? Can u help me ? Thanks.

Not just that, inference time does not improve either, nor does peak memory. I did not do this tutorial, but am experiencing the same results after applying zero quant to a bert uncased trained with MRPC. Is there a tutorial that shows improvements? The same can be said for XTC...
Same for me...
same for me too