outlines
outlines copied to clipboard
Get the AttributeError when vllm use guided-decoding-backend=xgrammar
Describe the issue as clearly as possible:
when vllm use guided-decoding-backend=xgrammar,the vLLM worker processes will terminatd.
Steps/code to reproduce the bug:
when use guided_grammar
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1")
# 定义算术表达式的 EBNF 语法
arithmetic_grammar = r"""
?start: expression
?expression: term (("+" | "-") term)*
?term: factor (("*" | "/") factor)*
?factor: NUMBER
| "-" factor
| "(" expression ")"
%import common.NUMBER
"""
response = client.chat.completions.create(
model="Qwen3-8B",
messages=[
{"role": "user", "content": "Alice had 4 apples and Bob ate 2. Write an expression for Alice's apples:"}
],
extra_body={
"guided_grammar": arithmetic_grammar
}
)
print(response.choices[0].message.content)
Expected result:
request 200
Error message:
INFO 06-04 09:48:14 [engine.py:310] Added request chatcmpl-7a6aea2e8da04904b441fbec199d0380.
INFO: 10.235.189.10:57017 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR 06-04 09:48:14 [engine.py:160] AttributeError('CachedPreTrainedTokenizerFast has no attribute vocabulary')
ERROR 06-04 09:48:14 [engine.py:160] Traceback (most recent call last):
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 158, in start
ERROR 06-04 09:48:14 [engine.py:160] self.run_engine_loop()
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 221, in run_engine_loop
ERROR 06-04 09:48:14 [engine.py:160] request_outputs = self.engine_step()
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 247, in engine_step
ERROR 06-04 09:48:14 [engine.py:160] raise e
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 230, in engine_step
ERROR 06-04 09:48:14 [engine.py:160] return self.engine.step()
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/engine/llm_engine.py", line 1412, in step
ERROR 06-04 09:48:14 [engine.py:160] outputs = self.model_executor.execute_model(
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/executor/executor_base.py", line 299, in execute_model
ERROR 06-04 09:48:14 [engine.py:160] driver_outputs = self._driver_execute_model(execute_model_req)
ERROR 06-04 09:48:14 [engine.py:160] File "/homellm_inference/vllm/vllm/executor/mp_distributed_executor.py", line 144, in _driver_execute_model
ERROR 06-04 09:48:14 [engine.py:160] return self.driver_worker.execute_model(execute_model_req)
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/worker/worker_base.py", line 420, in execute_model
ERROR 06-04 09:48:14 [engine.py:160] output = self.model_runner.execute_model(
ERROR 06-04 09:48:14 [engine.py:160] File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 06-04 09:48:14 [engine.py:160] return func(*args, **kwargs)
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm_ascend/worker/model_runner.py", line 1414, in execute_model
ERROR 06-04 09:48:14 [engine.py:160] logits = self.model.compute_logits(hidden_or_intermediate_states,
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/model_executor/models/llama.py", line 565, in compute_logits
ERROR 06-04 09:48:14 [engine.py:160] logits = self.logits_processor(self.lm_head, hidden_states,
ERROR 06-04 09:48:14 [engine.py:160] File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 06-04 09:48:14 [engine.py:160] return self._call_impl(*args, **kwargs)
ERROR 06-04 09:48:14 [engine.py:160] File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 06-04 09:48:14 [engine.py:160] return forward_call(*args, **kwargs)
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/model_executor/layers/logits_processor.py", line 83, in forward
ERROR 06-04 09:48:14 [engine.py:160] logits = _apply_logits_processors(logits, sampling_metadata)
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/model_executor/layers/logits_processor.py", line 170, in _apply_logits_processors
ERROR 06-04 09:48:14 [engine.py:160] _apply_logits_processors_single_seq(
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/model_executor/layers/logits_processor.py", line 195, in _apply_logits_processors_single_seq
ERROR 06-04 09:48:14 [engine.py:160] logits_row = logits_processor(past_tokens_ids, logits_row)
ERROR 06-04 09:48:14 [engine.py:160] File "/home/llm_inference/vllm/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 97, in __call__
ERROR 06-04 09:48:14 [engine.py:160] instruction = self._guide.get_next_instruction(
ERROR 06-04 09:48:14 [engine.py:160] File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/outlines/fsm/guide.py", line 153, in get_next_instruction
ERROR 06-04 09:48:14 [engine.py:160] self.iter_valid_token_ids(state, self.tokenizer.vocabulary.values())
ERROR 06-04 09:48:14 [engine.py:160] File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1111, in __getattr__
ERROR 06-04 09:48:14 [engine.py:160] raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
ERROR 06-04 09:48:14 [engine.py:160] AttributeError: CachedPreTrainedTokenizerFast has no attribute vocabulary
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [7597]
Process SpawnProcess-1:
Traceback (most recent call last):
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/process.py", line 317, in _bootstrap
util._exit_function()
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/util.py", line 334, in _exit_function
_run_finalizers(0)
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers
finalizer()
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/util.py", line 224, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/local/latest/python/site-packages/tbe/common/repository_manager/route.py", line 54, in wrapper
return func(cls, *args, **kwargs)
File "/usr/local/latest/python/site-packages/tbe/common/repository_manager/route.py", line 219, in finalize
cls.global_mgr.finalize()
File "/usr/local/latest/python/site-packages/tbe/common/repository_manager/utils/multiprocess_util.py", line 84, in finalize
self.mgr.shutdown()
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/util.py", line 224, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/managers.py", line 674, in _finalize_manager
process.join(timeout=1.0)
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/popen_fork.py", line 40, in wait
if not wait([self.sentinel], timeout):
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/multiprocessing/connection.py", line 936, in wait
ready = selector.select(timeout)
File "/home/anaconda3/envs/PyTorch-2.5.1/lib/python3.10/selectors.py", line 416, in select
fd_event_list = self._selector.poll(timeout)
File "/home/llm_inference/vllm/vllm/engine/multiprocessing/engine.py", line 426, in signal_handler
raise KeyboardInterrupt("MQLLMEngine terminated")
KeyboardInterrupt: MQLLMEngine terminated
INFO 06-04 09:48:17 [multiproc_worker_utils.py:137] Terminating local vLLM worker processes
Outlines/Python version information:
Version information
```
(command output here)
```
Context for the issue:
No response
This seems to be an issue with vLLM / XGrammar and not outlines? @RobinPicard ?
Yes, I think it's not an issue on Outlines's side. To be sure, could you please provide the version of vLLM and the command used to launch the server?