Ita Zaporozhets comments

Results 37 comments of


                                            Ita Zaporozhets

AutoTokenizer: Phi-3 drops spaces when decodes a token at a time

Hey @Andrei-Aksionov , thanks for the reproducer! It has to do with Phi-3 being based on the LlamaTokenizerFast and Phi-2 on CodeGen. LlamaTokenizerFast strips leading whitespace in order to manually...

Fix all_special_tokens to have special tokens from added_tokens

Awesome, thanks @ita9naiwa ! @ArthurZucker good to merge? 🚀

Fix all_special_tokens to have special tokens from added_tokens

Thanks @ArthurZucker ! Something like ```python self._additional_special_tokens.extend(new_tokens) ``` in the `SpecialTokensMixin.add_tokens` function?

apply_chat_template return_assistant_tokens_mask not work for Qwen2.5

Hey @Rocketknight1 , I tested this out with the default `chat_template` and think maybe the update to `generation_indices` was missed here https://github.com/huggingface/transformers/pull/32684 in the function below: https://github.com/huggingface/transformers/blob/9240137897096698f5292c7dd38d0651c8a33dc8/src/transformers/utils/chat_template_utils.py#L334-L349 LMK what you...

blt wip

Thank you @Cyrilvallez for your throrough feedback! 🤗 Applied most of the feedback with some comments for remaining points! Mainly: - using new outputs - merged mllama to update modular...

blt wip

applied 2nd round of feedback! in particular the mask creation is cleaned up!

blt wip

Thanks @Cyrilvallez for (all) the detailed rounds of feedback!! 🤗 all slow passed I believe just an unrelated test failing from main but I will do another detailed pass

How do I replace a spare tokens?

Hi @itshuey, can you please share the model and token you are attempting this with (in a short snippet would be great!) so I can take a look? 😊

How do I replace a spare tokens?

Hi @itshuey indeed this isn't fully supported with `AutoTokenizer` because it reads the `tokenizer.model` file which can't be modified manually. However you should be able to remove/ update if you...

How do I replace a spare tokens?

Sorry I apologize I wasn't very clear. You are correct that `.vocab` cannot be modified programmatically, and using `del` or `pop` would not work. Updating or deleting tokens would be...