outlines Get the AttributeError when vllm use guided-decoding-backend=xgrammar

Describe the issue as clearly as possible:

when vllm use guided-decoding-backend=xgrammar,the vLLM worker processes will terminatd.

Steps/code to reproduce the bug:

when use guided_grammar

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1")

# 定义算术表达式的 EBNF 语法
arithmetic_grammar = r"""
    ?start: expression
    ?expression: term (("+" | "-") term)*
    ?term: factor (("*" | "/") factor)*
    ?factor: NUMBER
           | "-" factor
           | "(" expression ")"
    %import common.NUMBER
"""

response = client.chat.completions.create(
    model="Qwen3-8B",
    messages=[
        {"role": "user", "content": "Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:"}
    ],
    extra_body={
        "guided_grammar": arithmetic_grammar
    }
)
print(response.choices[0].message.content)

Expected result:

request 200

Error message:

INFO 06-04 09:48:14 [engine.py:310] Added request chatcmpl-7a6aea2e8da04904b441fbec199d0380.
INFO:     10.235.189.10:57017 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 06-04 09:48:14 [engine.py:160] AttributeError('CachedPreTrainedTokenizerFast has no attribute vocabulary')
ERROR 06-04 09:48:14 [engine.py:160] Traceback (most recent call last):
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 158, in start
ERROR 06-04 09:48:14 [engine.py:160]     self.run_engine_loop()
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 221, in run_engine_loop
ERROR 06-04 09:48:14 [engine.py:160]     request_outputs = self.engine_step()
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 247, in engine_step
ERROR 06-04 09:48:14 [engine.py:160]     raise e
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 230, in engine_step
ERROR 06-04 09:48:14 [engine.py:160]     return self.engine.step()
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/engine/llm_engine.py", line 1412, in step
ERROR 06-04 09:48:14 [engine.py:160]     outputs = self.model_executor.execute_model(
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/executor/executor_base.py", line 299, in execute_model
ERROR 06-04 09:48:14 [engine.py:160]     driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 06-04 09:48:14 [engine.py:160]   File "/homellm_inference/vllm/vllm/executor/mp_distributed_executor.py", line 144, in _driver_execute_model
ERROR 06-04 09:48:14 [engine.py:160]     return self.driver_worker.execute_model(execute_model_req)
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/worker/worker_base.py", line 420, in execute_model
ERROR 06-04 09:48:14 [engine.py:160]     output = self.model_runner.execute_model(
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 06-04 09:48:14 [engine.py:160]     return func(*args, **kwargs)
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm_ascend/worker/model_runner.py", line 1414, in execute_model
ERROR 06-04 09:48:14 [engine.py:160]     logits = self.model.compute_logits(hidden_or_intermediate_states,
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/model_executor/models/llama.py", line 565, in compute_logits
ERROR 06-04 09:48:14 [engine.py:160]     logits = self.logits_processor(self.lm_head, hidden_states,
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 06-04 09:48:14 [engine.py:160]     return self._call_impl(*args, **kwargs)
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 06-04 09:48:14 [engine.py:160]     return forward_call(*args, **kwargs)
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/model_executor/layers/logits_processor.py", line 83, in forward
ERROR 06-04 09:48:14 [engine.py:160]     logits = _apply_logits_processors(logits, sampling_metadata)
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/model_executor/layers/logits_processor.py", line 170, in _apply_logits_processors
ERROR 06-04 09:48:14 [engine.py:160]     _apply_logits_processors_single_seq(
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/model_executor/layers/logits_processor.py", line 195, in _apply_logits_processors_single_seq
ERROR 06-04 09:48:14 [engine.py:160]     logits_row = logits_processor(past_tokens_ids, logits_row)
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/llm_inference/vllm/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 97, in __call__
ERROR 06-04 09:48:14 [engine.py:160]     instruction = self._guide.get_next_instruction(
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/outlines/fsm/guide.py", line 153, in get_next_instruction
ERROR 06-04 09:48:14 [engine.py:160]     self.iter_valid_token_ids(state, self.tokenizer.vocabulary.values())
ERROR 06-04 09:48:14 [engine.py:160]   File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1111, in __getattr__
ERROR 06-04 09:48:14 [engine.py:160]     raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
ERROR 06-04 09:48:14 [engine.py:160] AttributeError: CachedPreTrainedTokenizerFast has no attribute vocabulary
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [7597]
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/process.py", line 317, in _bootstrap
    util._exit_function()
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/util.py", line 334, in _exit_function
    _run_finalizers(0)
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/local/latest/python/site-packages/tbe/common/repository_manager/route.py", line 54, in wrapper
    return func(cls, *args, **kwargs)
  File "/usr/local/latest/python/site-packages/tbe/common/repository_manager/route.py", line 219, in finalize
    cls.global_mgr.finalize()
  File "/usr/local/latest/python/site-packages/tbe/common/repository_manager/utils/multiprocess_util.py", line 84, in finalize
    self.mgr.shutdown()
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/managers.py", line 674, in _finalize_manager
    process.join(timeout=1.0)
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/popen_fork.py", line 40, in wait
    if not wait([self.sentinel], timeout):
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/connection.py", line 936, in wait
    ready = selector.select(timeout)
  File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/selectors.py", line 416, in select
    fd_event_list = self._selector.poll(timeout)
  File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 426, in signal_handler
    raise KeyboardInterrupt("MQLLMEngine terminated")
KeyboardInterrupt: MQLLMEngine terminated
INFO 06-04 09:48:17 [multiproc_worker_utils.py:137] Terminating local vLLM worker processes

Outlines/Python version information:

Version information

``` (command output here) ```

Context for the issue:

No response

Jun 11 '25 09:06 Saxsgdsg

This seems to be an issue with vLLM / XGrammar and not outlines? @RobinPicard ?

Jun 22 '25 08:06 rlouf

Yes, I think it's not an issue on Outlines's side. To be sure, could you please provide the version of vLLM and the command used to launch the server?

Jul 21 '25 07:07 RobinPicard