RuntimeError: Cannot convert token
Describe the issue as clearly as possible:
This issue appear when using the newer models by Mistral AI, specifically I tried Nemo12B and Ministral-8B.
I don't really understand what's going on, but I would say that the issue is that the tokenizer needs to be specifically mistral for those models to work.
I know that these are new models and it can take time to implement them here, but I didn't see any similar issue and I wanted to at least bring attention to the issue.
Steps/code to reproduce the bug:
from vllm import LLM
from outlines import models, generate
model_name = "mistralai/Ministral-8B-Instruct-2410"
llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
model = models.VLLM(llm)
generator = generate.regex(model, r"[12345]")
prompt = "Reply with a number: "
outputs = generator(prompt)
print(outputs)
Expected result:
The output string from the model's generation
1
Error message:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[2], line 13
11 llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
12 model = models.VLLM(llm)
---> 13 generator = generate.regex(model, r"[12345]")
15 prompt = "Reply with a number: "
16 outputs = generator(prompt)
File ~/miniconda3/envs/farmabot/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
885 if not args:
886 raise TypeError(f'{funcname} requires at least '
887 '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)
File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/generate/regex.py:73, in regex_vllm(model, regex_str, sampler)
65 @regex.register(VLLM)
66 def regex_vllm(
67 model: VLLM,
68 regex_str: str,
69 sampler: Sampler = multinomial(),
70 ):
71 from outlines.integrations.vllm import RegexLogitsProcessor
---> 73 logits_processor = RegexLogitsProcessor(regex_str, model.model)
74 return SequenceGeneratorAdapter(model, logits_processor, sampler)
File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/integrations/vllm.py:82, in RegexLogitsProcessor.__init__(self, regex_string, llm)
80 tokenizer = adapt_tokenizer(tokenizer=tokenizer)
81 self.mask_cache: Dict[int, torch.Tensor] = {}
---> 82 self.fsm = RegexGuide(regex_string, tokenizer)
83 self._fsm_state: DefaultDict[int, int] = defaultdict(int)
File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/fsm/guide.py:145, in RegexGuide.__init__(self, regex_string, tokenizer)
140 def __init__(self, regex_string: str, tokenizer):
141 (
142 self.states_to_token_maps,
143 self.empty_token_ids,
144 fsm_finals,
--> 145 ) = create_states_mapping(regex_string, tokenizer)
146 self.eos_token_id = tokenizer.eos_token_id
147 self.final_states = fsm_finals | {-1}
File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/caching.py:122, in cache.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
119 result = wrapper.__memory__.get(cache_key, default=ENOVAL, retry=True)
121 if result is ENOVAL:
--> 122 result = cached_function(*args, **kwargs)
123 wrapper.__memory__.set(cache_key, result, expire, retry=True)
125 return result
File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/fsm/guide.py:118, in create_states_mapping(regex_string, tokenizer)
116 byte_fsm = make_byte_level_fsm(regex_pattern.to_fsm().reduce(), keep_utf8=True)
117 regex_fsm, _ = make_deterministic_fsm(byte_fsm)
--> 118 states_to_token_maps, empty_token_ids = create_fsm_index_tokenizer(
119 regex_fsm, tokenizer
120 )
122 # We make sure that it is possible to generate strings in the language
123 # of the regular expression with the tokens present in the model's
124 # vocabulary.
125 if not any(
126 regex_fsm.finals.intersection(v.values()) for v in states_to_token_maps.values()
127 ):
File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/fsm/regex.py:898, in create_fsm_index_tokenizer(fsm, tokenizer)
885 def create_fsm_index_tokenizer(
886 fsm: BetterFSM,
887 tokenizer: "Tokenizer",
888 ) -> Tuple[Dict[int, Dict[int, int]], Set[int]]:
889 """Construct an FMS index from a tokenizer.
890
891 This uses the end-to-end approach of `create_fsm_index_end_to_end`.
(...)
896
897 """
--> 898 vocabulary, empty_token_ids = reduced_vocabulary(tokenizer)
900 states_to_token_subsets = create_fsm_index_end_to_end(fsm.fsm_info, vocabulary)
902 # Allow transitions to EOS from all terminals FSM states that are
903 # reachable
904 # TODO: Do we really need this anymore?
File ~/miniconda3/envs/farmabot/lib/python3.10/site-packages/outlines/fsm/regex.py:861, in reduced_vocabulary(tokenizer)
857 token_bytes = cast(
858 List[int], [gpt2_unicode_to_bytes().get(c) for c in token]
859 )
860 if None in token_bytes:
--> 861 raise RuntimeError(
862 f"Cannot convert token `{token}` ({token_idx}) to bytes: {token_str}"
863 )
864 token_str = "".join(byte_symbol(b) for b in token_bytes)
866 vocabulary.setdefault(token_str, []).append(token_idx)
RuntimeError: Cannot convert token ` �` (130971) to bytes: �
Outlines/Python version information:
Version information
Context for the issue:
The issue is that outlines can't be use with the newest Mistral models
Same on the Nemotron models.
Same on the Nemotron models.
Hey! The only resource I found about that was this: https://github.com/vllm-project/vllm/issues/9359#issuecomment-2412840803. I don't know if that's always the case, because the guided decoding actually works with Mistral-7B-Instruct