transformers
transformers copied to clipboard
Add missing lang tokens in M2M100Tokenizer.get_vocab
What does this PR do?
The lang tokens were missing from M2M100Tokenizer.get_vocab. The get_vocab method is updated to match other multilingual tokenizers such as NllbTokenizer and MBart50Tokenizer.
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [x] Did you write any new necessary tests?
Who can review?
@n1t0, @LysandreJik, @SaulLu
The documentation is not available anymore as the PR was closed or merged.
A friendly re-ping to @patil-suraj :hugs:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Maybe of interest to @ArthurZucker :)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Re-ping of @ArthurZucker