Results 14 comments of Yoni

Forgive me if I'm wrong, The problem occurs because the default gpt2 byte encoder doesn't contain all of unicode characters. This is from gpt2 `byte_encoder` which mapped to `bytes_to_unicode`: `list(range(ord("!"),...

I think this should just give a warning instead. I mean the original issue with mistral can already be solved with installing `sentencepiece`. The gpt2 is already a worst case...

If you don't install `sentencepiece`, the tokenizer will fallback to fast tokenizer which doesn't have `sp_model`. See here https://github.com/guidance-ai/guidance/blob/e234c565b61ffb90dbbf81cd937a00505ef79649/guidance/models/transformers/_transformers.py#L99

I suppose so. But as I said before, I don't think it's realistic to support every model out there (for now?). I can only think one other option instead of...