transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add missing lang tokens in M2M100Tokenizer.get_vocab

Open guillaumekln opened this issue 3 years ago • 1 comments

What does this PR do?

The lang tokens were missing from M2M100Tokenizer.get_vocab. The get_vocab method is updated to match other multilingual tokenizers such as NllbTokenizer and MBart50Tokenizer.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [x] Did you write any new necessary tests?

Who can review?

@n1t0, @LysandreJik, @SaulLu

guillaumekln avatar Aug 02 '22 07:08 guillaumekln

The documentation is not available anymore as the PR was closed or merged.

A friendly re-ping to @patil-suraj :hugs:

SaulLu avatar Sep 01 '22 15:09 SaulLu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 26 '22 15:09 github-actions[bot]

Maybe of interest to @ArthurZucker :)

LysandreJik avatar Sep 27 '22 20:09 LysandreJik

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 22 '22 15:10 github-actions[bot]

Re-ping of @ArthurZucker

sgugger avatar Oct 24 '22 13:10 sgugger