transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Fix convert_tekken_tokenizer

Open juliendenize opened this issue 3 weeks ago • 7 comments

What does this PR do?

Right now the convert_tekken_tokenizer does not add bos_tokens, eos_token to the special tokens via the add_special_tokens method.

This prevents the chat templates that expect eos_token and bos_token to work properly.

Previously this was working as when saving the tokenizer a special_tokens_map.json was created which is no longer the case. Unknown to me why but I'd assume this is due to the V5 refactoring ?

This PR fixes that by adding explicitly these tokens to the tokenizer and when saving they're now stored in tokenizer_config.json.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

@ArthurZucker

juliendenize avatar Dec 03 '25 11:12 juliendenize

cc @itazap

Rocketknight1 avatar Dec 05 '25 14:12 Rocketknight1

run-slow: ministral3, mistral3

itazap avatar Dec 08 '25 15:12 itazap

This comment contains run-slow, running the specified jobs:

models: ["models/ministral3", "models/mistral3"] quantizations: []

github-actions[bot] avatar Dec 08 '25 15:12 github-actions[bot]

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

github-actions[bot] avatar Dec 08 '25 15:12 github-actions[bot]

Hey! Thanks for the PR, can you please share a short reproducer of the problem (you mentioned in chat templates)? perhaps we'll need to add a test !

itazap avatar Dec 08 '25 16:12 itazap

[For maintainers] Suggested jobs to run (before merge)

run-slow: ministral3, mistral3

github-actions[bot] avatar Dec 09 '25 16:12 github-actions[bot]

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.