Yoni
Yoni
Forgive me if I'm wrong, The problem occurs because the default gpt2 byte encoder doesn't contain all of unicode characters. This is from gpt2 `byte_encoder` which mapped to `bytes_to_unicode`: `list(range(ord("!"),...
I think this should just give a warning instead. I mean the original issue with mistral can already be solved with installing `sentencepiece`. The gpt2 is already a worst case...
If you don't install `sentencepiece`, the tokenizer will fallback to fast tokenizer which doesn't have `sp_model`. See here https://github.com/guidance-ai/guidance/blob/e234c565b61ffb90dbbf81cd937a00505ef79649/guidance/models/transformers/_transformers.py#L99
I suppose so. But as I said before, I don't think it's realistic to support every model out there (for now?). I can only think one other option instead of...