transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Fix for slow the bug tokenizer adding spaces to single id decodes

Open DuyguA opened this issue 1 year ago • 2 comments

What does this PR do?

Quick fix for a bug with the tokenizer, slow tokenizers add spaces in between when the input is a single id.

Fixes #29489

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [x] Did you write any new necessary tests?

Who can review?

@ArthurZucker

DuyguA avatar Aug 09 '24 11:08 DuyguA

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cc @itazap as well!

LysandreJik avatar Aug 27 '24 11:08 LysandreJik

Thanks for the quick update 🤗 Thanks for merging the tests! Left a few comments about the single special token case, let me know what you think!

No worries, I'll do the changes :wink:

DuyguA avatar Aug 29 '24 13:08 DuyguA

@ArthurZucker and @LysandreJik merge time please :wink:

DuyguA avatar Sep 09 '24 11:09 DuyguA

Gentle ping @itazap , can we do the merge? Some commits from the main was failing this branch but looks like all fixed , can we do the merge before any more breaking changes come :grin: :grin: :grimacing:

DuyguA avatar Sep 17 '24 08:09 DuyguA

@DuyguA Sorry for the delay! Merged !! 🚀 Thanks for working on this 🤗

itazap avatar Sep 18 '24 10:09 itazap

🎉Congrats @DuyguA !This issue has really been a long journey.

Ki-Seki avatar Sep 18 '24 10:09 Ki-Seki