transformers icon indicating copy to clipboard operation
transformers copied to clipboard

wip: Add TF implementation of LongT5

Open stancld opened this issue 3 years ago • 2 comments

Fixes #18063

Add:

  • [x] Local attention model
    • need to fix tests
  • [x] Transient global attention model
  • [x] Add tests for EncoderOnly model
  • [ ] Add integration tests
  • [ ] Fix PT-TF equivalence (Local) - encoder seems fine, but some discrepancy occurs within the pass through the decoder
  • [ ] Fix PT-TF equivalence (TGlobal)
  • [ ] Run all slow tests
  • [ ] Prepare TF checkpoints

What does this PR do?

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

stancld avatar Aug 30 '22 09:08 stancld

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Hi @patrickvonplaten and @gante, FYI I've been gradually fixing some PT-TF discrepancies -- I should have some spare time again next weekend, so hopefully, then it should be ready for review :]

stancld avatar Sep 18 '22 16:09 stancld

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 13 '22 15:10 github-actions[bot]

@stancld should I reopen the PR? :)

gante avatar Oct 23 '22 10:10 gante

Hi @gante, yes, I'd open that again.

I apologize for being so slow here, but I've been pretty busy now. I'll try to finish this.

stancld avatar Oct 26 '22 07:10 stancld

No worries @stancld, take your time 🤗 And thank you for working on it!

gante avatar Oct 26 '22 11:10 gante

Hi @gante, I managed to fix some bugs. There are still some minor discrepancies between PT and TF implementations. Would you mind having a first look if you spot any obvious differences, please? :]

Otherwise, TF-only tests seem to be passing 🐰

(Btw, CI is passing, but the tests are failing locally, so I'm not really sure :D )

stancld avatar Oct 30 '22 12:10 stancld

@stancld Will have a look 👍 Can I have a copy of the error(s) you see locally? (I'm assuming on the slow tests)

gante avatar Oct 30 '22 16:10 gante

Also cc @ArthurZucker here

patrickvonplaten avatar Nov 02 '22 12:11 patrickvonplaten

@stancld Will have a look 👍 Can I have a copy of the error(s) you see locally? (I'm assuming on the slow tests)

Sorry for the late reply. I fail on PT-TF equivalence tests, basically, saying there's a too high difference between outputs.

stancld avatar Nov 13 '22 15:11 stancld

Hey @stancld ! Thanks for the addition! There are a few approaches we can take here. Sometimes the tolerance is a bit too high and part of the hidden states don't match but the final output does, in that case, we can lower the tolerance (maybe to around 4e-2 other wise, I will have a look !

ArthurZucker avatar Nov 14 '22 10:11 ArthurZucker

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Almost there I think :-)

patrickvonplaten avatar Dec 30 '22 15:12 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 31 '23 15:01 github-actions[bot]