MS-AMP
MS-AMP copied to clipboard
Remove model_state.use_fp8_ddp and optimizer.all_reduce_grads
Description
The argument model_state.use_fp8_ddp
is deprecated.
In MS-AMP examples, all of model_state.use_fp8_ddp
are set to True. Besides, the function optimizer.all_reduce_grads
has not been used.
Major Revision
- Remove
model_state.use_fp8_ddp
- Remove
optimizer.all_reduce_grads
- Remove the related unittests
- Update the unittest
test_fp8linear_backward
since the type of weight gradient is torch.Tensor whenmodel_state.use_fp8_ddp
is True.
In MS-AMP-Examples, we used optimizer.all_reduce_grads. We need to remove it from examples.