transformers
transformers copied to clipboard
wip: Add TF implementation of LongT5
Fixes #18063
Add:
- [x] Local attention model
- need to fix tests
- [x] Transient global attention model
- [x] Add tests for EncoderOnly model
- [ ] Add integration tests
- [ ] Fix PT-TF equivalence (Local) - encoder seems fine, but some discrepancy occurs within the pass through the decoder
- [ ] Fix PT-TF equivalence (TGlobal)
- [ ] Run all slow tests
- [ ] Prepare TF checkpoints
What does this PR do?
Fixes # (issue)
Before submitting
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
Hi @patrickvonplaten and @gante, FYI I've been gradually fixing some PT-TF discrepancies -- I should have some spare time again next weekend, so hopefully, then it should be ready for review :]
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@stancld should I reopen the PR? :)
Hi @gante, yes, I'd open that again.
I apologize for being so slow here, but I've been pretty busy now. I'll try to finish this.
No worries @stancld, take your time 🤗 And thank you for working on it!
Hi @gante, I managed to fix some bugs. There are still some minor discrepancies between PT and TF implementations. Would you mind having a first look if you spot any obvious differences, please? :]
Otherwise, TF-only tests seem to be passing 🐰
(Btw, CI is passing, but the tests are failing locally, so I'm not really sure :D )
@stancld Will have a look 👍 Can I have a copy of the error(s) you see locally? (I'm assuming on the slow tests)
Also cc @ArthurZucker here
@stancld Will have a look 👍 Can I have a copy of the error(s) you see locally? (I'm assuming on the slow tests)
Sorry for the late reply. I fail on PT-TF equivalence tests, basically, saying there's a too high difference between outputs.
Hey @stancld ! Thanks for the addition! There are a few approaches we can take here. Sometimes the tolerance is a bit too high and part of the hidden states don't match but the final output does, in that case, we can lower the tolerance (maybe to around 4e-2 other wise, I will have a look !
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
Almost there I think :-)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.