Add missing lang tokens in M2M100Tokenizer.get_vocab

Open guillaumekln opened this issue 3 years ago • 1 comments

What does this PR do?

The lang tokens were missing from M2M100Tokenizer.get_vocab. The get_vocab method is updated to match other multilingual tokenizers such as NllbTokenizer and MBart50Tokenizer.

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[x] Did you write any new necessary tests?

Who can review?

@n1t0, @LysandreJik, @SaulLu

Aug 02 '22 07:08 guillaumekln

The documentation is not available anymore as the PR was closed or merged.

Aug 02 '22 07:08 HuggingFaceDocBuilderDev

A friendly re-ping to @patil-suraj :hugs:

Sep 01 '22 15:09 SaulLu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sep 26 '22 15:09 github-actions[bot]

Maybe of interest to @ArthurZucker :)

Sep 27 '22 20:09 LysandreJik

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Oct 22 '22 15:10 github-actions[bot]

Re-ping of @ArthurZucker

Oct 24 '22 13:10 sgugger

transformers transformers copied to clipboard

Add missing lang tokens in M2M100Tokenizer.get_vocab

What does this PR do?

Before submitting

Who can review?

transformers
transformers copied to clipboard