Load AMP checkpoint to FP32; save gradient scaler state

Open gaganbahga opened this issue 3 years ago • 0 comments

Before submitting

[ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
[x] Did you read the contributor guideline?
[ ] Did you make sure to update the docs?
[ ] Did you write any new necessary tests?

What does this PR do?

Enables loading model checkpoint trained with AMP into FP32 and vice versa.
Saves the state of AMP gradient scaler with checkpoint.

The first change makes it possible to stop training in AMP if loss explodes, continue training in FP32 and switch back to AMP if needed. Second change is mostly to continue with same value of gradient scaling rather than start from amp-init-scale every time.

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Jun 28 '22 21:06 gaganbahga

fairseq fairseq copied to clipboard

Load AMP checkpoint to FP32; save gradient scaler state

Before submitting

What does this PR do?

PR review

Did you have fun?

fairseq
fairseq copied to clipboard