Ita Zaporozhets
Ita Zaporozhets
Hey @Andrei-Aksionov , thanks for the reproducer! It has to do with Phi-3 being based on the LlamaTokenizerFast and Phi-2 on CodeGen. LlamaTokenizerFast strips leading whitespace in order to manually...
Awesome, thanks @ita9naiwa ! @ArthurZucker good to merge? 🚀
Thanks @ArthurZucker ! Something like ```python self._additional_special_tokens.extend(new_tokens) ``` in the `SpecialTokensMixin.add_tokens` function?
Hey @Rocketknight1 , I tested this out with the default `chat_template` and think maybe the update to `generation_indices` was missed here https://github.com/huggingface/transformers/pull/32684 in the function below: https://github.com/huggingface/transformers/blob/9240137897096698f5292c7dd38d0651c8a33dc8/src/transformers/utils/chat_template_utils.py#L334-L349 LMK what you...
Thank you @Cyrilvallez for your throrough feedback! 🤗 Applied most of the feedback with some comments for remaining points! Mainly: - using new outputs - merged mllama to update modular...
applied 2nd round of feedback! in particular the mask creation is cleaned up!
Thanks @Cyrilvallez for (all) the detailed rounds of feedback!! 🤗 all slow passed I believe just an unrelated test failing from main but I will do another detailed pass
Hi @itshuey, can you please share the model and token you are attempting this with (in a short snippet would be great!) so I can take a look? 😊
Hi @itshuey indeed this isn't fully supported with `AutoTokenizer` because it reads the `tokenizer.model` file which can't be modified manually. However you should be able to remove/ update if you...
Sorry I apologize I wasn't very clear. You are correct that `.vocab` cannot be modified programmatically, and using `del` or `pop` would not work. Updating or deleting tokens would be...