DeepSpeed
DeepSpeed copied to clipboard
[BUG] run_zero_quant.sh seems not working
Describe the bug I am trying run_zero_quant for GPT-J model, but I find the output model size is not compressed, the model file size is the same as origin model, also after I load the model, the GPU memory is the same as origin model,
To Reproduce Steps to reproduce the behavior:
- Go to DeepSpeedExamples/model_compression/bert
- Run: pip install -r requirements.txt bash bash_script/run_zero_quant.sh If I don't do any change to sh, it will use origin gpt-2 model, also I tried gpt-j model. Neither output model file size are compressed
Is there a bug? Could you please check if this issue can be reproduced?