Arthur

Results 795 comments of Arthur

Let's close it 😉

I am pretty sure that `32106: AddedToken(" ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),` is an issue: ```python tokenizer.encode("hey .") ``` will produce this issue

If I do `AutoTokenizer.from_pretrained("path-to-model", added_tokens_decoder=None)` then this is no longer the case

Re-opening as the merge on main will be reverted for a better fix soon

Hey! It does not seem to be asked that much unfortunately and would be a loooot of efforts on our side. You do have unofficial C bindings out there I...

This was fixed in `transformers` you need to set `legacy=False` 🤗

This issue is more a feature request than a `problem`. You are doing something wrong as the error indicates: pretty sure the special tokens are missing in the `tokenizer` while...

Hey! Pretty sure it is available in `peft` see this [notebook](https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb) and this [discussion](https://github.com/openai/whisper/discussions/988)