MS-AMP
MS-AMP copied to clipboard
Remove model_state.use_fp8_ddp and optimizer.all_reduce_grads
Description
The argument model_state.use_fp8_ddp is deprecated.
In MS-AMP examples, all of model_state.use_fp8_ddp are set to True. Besides, the function optimizer.all_reduce_grads has not been used.
Major Revision
- Remove
model_state.use_fp8_ddp - Remove
optimizer.all_reduce_grads - Remove the related unittests
- Update the unittest
test_fp8linear_backwardsince the type of weight gradient is torch.Tensor whenmodel_state.use_fp8_ddpis True.
In MS-AMP-Examples, we used optimizer.all_reduce_grads. We need to remove it from examples.