DeepSpeed
DeepSpeed copied to clipboard
modify the quantize.py file for efficiency
- Update the calculate from torch.split to torch.amin/torch.amax for fast computation
- Update stochastic rounding computation logic (faster and cleaner) a. support both sym/asym sr in pytorch level b. reduce the new tensor creator from 2-->1 c. support cpu tensor as well
- change fp16 --> fp32 to avoid overflow issue
- change some other logic for easy understanding
Can one of the admins verify this patch?
Stale PR. quantize.py is quite different now. These changes are no longer relevant, therefore closing the PR.