vllm
vllm copied to clipboard
[Bug]: `guided_regex` not working on M2 Ultra VLLM
Your current environment
Python 3.12, used conda and official installation guide.
Happens regardless of the employed model.
🐛 Describe the bug
The output of `python -m vllm.entrypoints.openai.api_server --trust-remote-code --model google/gemma-3-4b-it --port 27090 --max-model-len 10000 --api-key token-abc123`
In Console:
WARNING 06-16 09:21:02 [__init__.py:34] xgrammar module cannot be imported successfully. Falling back to use outlines instead.
WARNING 06-16 09:21:02 [__init__.py:34] xgrammar module cannot be imported successfully. Falling back to use outlines instead.
Happens regardless of the employed model.
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
fyi https://github.com/vllm-project/vllm/pull/18359
but structured outputs in CPU has limited suppport.
But I'm using the Mac's GPU
vllm doesn't have support with mlx, so not sure it is even being used there.