Fixes #18063

Add:

[x] Local attention model
- need to fix tests
[x] Transient global attention model
[x] Add tests for EncoderOnly model
[ ] Add integration tests
[ ] Fix PT-TF equivalence (Local) - encoder seems fine, but some discrepancy occurs within the pass through the decoder
[ ] Fix PT-TF equivalence (TGlobal)
[ ] Run all slow tests
[ ] Prepare TF checkpoints

What does this PR do?

Fixes # (issue)

Before submitting

[x] Did you read the contributor guideline, Pull Request section?
[x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Aug 30 '22 09:08 stancld

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Aug 30 '22 09:08 HuggingFaceDocBuilderDev

Hi @patrickvonplaten and @gante, FYI I've been gradually fixing some PT-TF discrepancies -- I should have some spare time again next weekend, so hopefully, then it should be ready for review :]

Sep 18 '22 16:09 stancld

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Oct 13 '22 15:10 github-actions[bot]

@stancld should I reopen the PR? :)

Oct 23 '22 10:10 gante

Hi @gante, yes, I'd open that again.

I apologize for being so slow here, but I've been pretty busy now. I'll try to finish this.

Oct 26 '22 07:10 stancld

No worries @stancld, take your time 🤗 And thank you for working on it!

Oct 26 '22 11:10 gante

Hi @gante, I managed to fix some bugs. There are still some minor discrepancies between PT and TF implementations. Would you mind having a first look if you spot any obvious differences, please? :]

Otherwise, TF-only tests seem to be passing 🐰

(Btw, CI is passing, but the tests are failing locally, so I'm not really sure :D )

Oct 30 '22 12:10 stancld

@stancld Will have a look 👍 Can I have a copy of the error(s) you see locally? (I'm assuming on the slow tests)

Oct 30 '22 16:10 gante

Also cc @ArthurZucker here

Nov 02 '22 12:11 patrickvonplaten

@stancld Will have a look 👍 Can I have a copy of the error(s) you see locally? (I'm assuming on the slow tests)

Sorry for the late reply. I fail on PT-TF equivalence tests, basically, saying there's a too high difference between outputs.

Nov 13 '22 15:11 stancld

Hey @stancld ! Thanks for the addition! There are a few approaches we can take here. Sometimes the tolerance is a bit too high and part of the hidden states don't match but the final output does, in that case, we can lower the tolerance (maybe to around 4e-2 other wise, I will have a look !

Nov 14 '22 10:11 ArthurZucker

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Nov 16 '22 14:11 HuggingFaceDocBuilderDev

Almost there I think :-)

Dec 30 '22 15:12 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 31 '23 15:01 github-actions[bot]

transformers
transformers copied to clipboard

wip: Add TF implementation of LongT5

What does this PR do?

Before submitting

Who can review?

transformers transformers copied to clipboard

wip: Add TF implementation of LongT5

What does this PR do?

Before submitting

Who can review?

transformers
transformers copied to clipboard