transformers
transformers copied to clipboard
XGLM - Fix Softmax NaNs when using FP16
What does this PR do?
Fixes #18049 following the exact same procedure used in #17437. Beside the added test, I also evaluated the fix on my personal use-case and found the behavior of the fixed model to be consistent when performing single or batched generation.
Before submitting
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [x] Did you write any new necessary tests?
Who can review?
@patil-suraj @ydshieh @patrickvonplaten
The documentation is not available anymore as the PR was closed or merged.
@patil-suraj I think only your check is missing!
Sorry for being so late here @gsarti! Merged master into it to ping circle ci here
Hey @gsarti - it seems like a test is failing now:
tests/models/xglm/test_modeling_xglm.py::XGLMModelTest::test_xglm_model_past
with
UnboundLocalError: local variable 'dtype_attn_weights' referenced before assignment
Hey @gsarti - it seems like a test is failing now:
tests/models/xglm/test_modeling_xglm.py::XGLMModelTest::test_xglm_model_past
with
UnboundLocalError: local variable 'dtype_attn_weights' referenced before assignment
I noticed this when running the code. My understanding is that setting dtype_attn_weights
as torch.float32
as default beforehand would fix the issue and maintain the expected behavior, could you double-check?
Hi @gsarti Sorry for being late for this PR. I re-opened it and give some suggestion for a fix to the failing test. Would you like to update this PR after rebasing your working branch on an updated main
branch?
Hi @gsarti I made the necessary change to pass the tests, and pushed to your branch directly. The remaining failing test is irrelevant to this PR, but I will wait until tomorrow to check again, then I will merge.
cc @patrickvonplaten and @younesbelkada
Thanks a lot for the fix @ydshieh !! I think for consistency we should apply the same changes on OPT too, I will take care of that first thing in the morning tomorrow 💪