DeepSpeed
DeepSpeed copied to clipboard
stage_1_and_2.py: do gradient scale only for fp16
for bf16, the gradient scale is not needed.