Fix for slow the bug tokenizer adding spaces to single id decodes
What does this PR do?
Quick fix for a bug with the tokenizer, slow tokenizers add spaces in between when the input is a single id.
Fixes #29489
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [x] Did you write any new necessary tests?
Who can review?
@ArthurZucker
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
cc @itazap as well!
Thanks for the quick update 🤗 Thanks for merging the tests! Left a few comments about the single special token case, let me know what you think!
No worries, I'll do the changes :wink:
@ArthurZucker and @LysandreJik merge time please :wink:
Gentle ping @itazap , can we do the merge? Some commits from the main was failing this branch but looks like all fixed , can we do the merge before any more breaking changes come :grin: :grin: :grimacing:
@DuyguA Sorry for the delay! Merged !! 🚀 Thanks for working on this 🤗
🎉Congrats @DuyguA !This issue has really been a long journey.