fix loss scaling only when compute_loss_func is used

Open BlackNoodle opened this issue 1 year ago • 0 comments

What does this PR do?

In #34198, the line loss *= self.args.gradient_accumulation_steps was introduced due to Negate accelerate grad accum div. This change was made to correct errors encountered during gradient accumulation. However, the scaling should only occur when compute_loss_func is used, so the code was modified accordingly.

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. @muellerzr @ArthurZucker

Oct 18 '24 03:10 BlackNoodle

transformers transformers copied to clipboard

fix loss scaling only when compute_loss_func is used

What does this PR do?

Before submitting

Who can review?

transformers
transformers copied to clipboard