outlines RuntimeError: Cannot convert token

Describe the issue as clearly as possible:

This issue appear when using the newer models by Mistral AI, specifically I tried Nemo12B and Ministral-8B.

I don't really understand what's going on, but I would say that the issue is that the tokenizer needs to be specifically mistral for those models to work.

I know that these are new models and it can take time to implement them here, but I didn't see any similar issue and I wanted to at least bring attention to the issue.

Steps/code to reproduce the bug:

from vllm import LLM
from outlines import models, generate

model_name = "mistralai/Ministral-8B-Instruct-2410"

llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")

model = models.VLLM(llm)
generator = generate.regex(model, r"[12345]")

prompt = "Reply with a number: "
outputs = generator(prompt)

print(outputs)

Expected result:

The output string from the model's generation

Error message:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 13
     11 llm = LLM(model=model_name,  tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
     12 model = models.VLLM(llm)
---> 13 generator = generate.regex(model, r"[12345]")
     15 prompt = "Reply with a number: "
     16 outputs = generator(prompt)

File ~/miniconda3/envs/farmabot/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)

File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/generate/regex.py:73, in regex_vllm(model, regex_str, sampler)
     65 @regex.register(VLLM)
     66 def regex_vllm(
     67     model: VLLM,
     68     regex_str: str,
     69     sampler: Sampler = multinomial(),
     70 ):
     71     from outlines.integrations.vllm import RegexLogitsProcessor
---> 73     logits_processor = RegexLogitsProcessor(regex_str, model.model)
     74     return SequenceGeneratorAdapter(model, logits_processor, sampler)

File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/integrations/vllm.py:82, in RegexLogitsProcessor.__init__(self, regex_string, llm)
     80 tokenizer = adapt_tokenizer(tokenizer=tokenizer)
     81 self.mask_cache: Dict[int, torch.Tensor] = {}
---> 82 self.fsm = RegexGuide(regex_string, tokenizer)
     83 self._fsm_state: DefaultDict[int, int] = defaultdict(int)

File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/fsm/guide.py:145, in RegexGuide.__init__(self, regex_string, tokenizer)
    140 def __init__(self, regex_string: str, tokenizer):
    141     (
    142         self.states_to_token_maps,
    143         self.empty_token_ids,
    144         fsm_finals,
--> 145     ) = create_states_mapping(regex_string, tokenizer)
    146     self.eos_token_id = tokenizer.eos_token_id
    147     self.final_states = fsm_finals | {-1}

File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/caching.py:122, in cache.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    119 result = wrapper.__memory__.get(cache_key, default=ENOVAL, retry=True)
    121 if result is ENOVAL:
--> 122     result = cached_function(*args, **kwargs)
    123     wrapper.__memory__.set(cache_key, result, expire, retry=True)
    125 return result

File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/fsm/guide.py:118, in create_states_mapping(regex_string, tokenizer)
    116 byte_fsm = make_byte_level_fsm(regex_pattern.to_fsm().reduce(), keep_utf8=True)
    117 regex_fsm, _ = make_deterministic_fsm(byte_fsm)
--> 118 states_to_token_maps, empty_token_ids = create_fsm_index_tokenizer(
    119     regex_fsm, tokenizer
    120 )
    122 # We make sure that it is possible to generate strings in the language
    123 # of the regular expression with the tokens present in the model's
    124 # vocabulary.
    125 if not any(
    126     regex_fsm.finals.intersection(v.values()) for v in states_to_token_maps.values()
    127 ):

File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/fsm/regex.py:898, in create_fsm_index_tokenizer(fsm, tokenizer)
    885 def create_fsm_index_tokenizer(
    886     fsm: BetterFSM,
    887     tokenizer: "Tokenizer",
    888 ) -> Tuple[Dict[int, Dict[int, int]], Set[int]]:
    889     """Construct an FMS index from a tokenizer.
    890 
    891     This uses the end-to-end approach of `create_fsm_index_end_to_end`.
   (...)
    896 
    897     """
--> 898     vocabulary, empty_token_ids = reduced_vocabulary(tokenizer)
    900     states_to_token_subsets = create_fsm_index_end_to_end(fsm.fsm_info, vocabulary)
    902     # Allow transitions to EOS from all terminals FSM states that are
    903     # reachable
    904     # TODO: Do we really need this anymore?

File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/fsm/regex.py:861, in reduced_vocabulary(tokenizer)
    857         token_bytes = cast(
    858             List[int], [gpt2_unicode_to_bytes().get(c) for c in token]
    859         )
    860         if None in token_bytes:
--> 861             raise RuntimeError(
    862                 f"Cannot convert token `{token}` ({token_idx}) to bytes: {token_str}"
    863             )
    864     token_str = "".join(byte_symbol(b) for b in token_bytes)
    866 vocabulary.setdefault(token_str, []).append(token_idx)

RuntimeError: Cannot convert token ` �` (130971) to bytes:  �

Outlines/Python version information:

Version information

0.0.1 Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] vllm==0.6.3.post1 vllm-flash-attn==2.6.1

Context for the issue:

The issue is that outlines can't be use with the newest Mistral models

Oct 22 '24 12:10 riccardolunardi

Same on the Nemotron models.

Nov 27 '24 20:11 aw632

Same on the Nemotron models.

Hey! The only resource I found about that was this: https://github.com/vllm-project/vllm/issues/9359#issuecomment-2412840803. I don't know if that's always the case, because the guided decoding actually works with Mistral-7B-Instruct

Nov 28 '24 09:11 riccardolunardi