transformers icon indicating copy to clipboard operation
transformers copied to clipboard

SPLIT PR: eos bos tokens

Open itazap opened this issue 1 year ago • 1 comments

Fix for 2 issues:

  1. add_bos_token & add_eos_token flags ignored for PreTrainedTokenizerFast: issue discussed here and here
  2. add_special_tokens does not update bos_token or eos_token - ex .add_special_tokens({'bos_token': '<new_bos>'})

TASKS:

  • [x] added an update_post_processor function in PreTrainedTokenizerFast based on llamatokenizer, allows reading of bos / eos token flag

**SUPPORTS FAST ONLY slow required updating kwargs to be passed into sp_model , so that bos/eos tokens can be added accordingly..

Reviewer: @ArthurZucker

NOTE: hub token seems to not have access to llama 3, should pass after addressed

itazap avatar Jun 07 '24 14:06 itazap

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.