[Bug] Crash special token xgrammar
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.
Describe the bug
When using xgrammar with an EBNF grammar, SGLang will crash if the model outputs a reserved token.
[2025-01-24 04:52:54 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1756, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 512, in event_loop_overlap
self.process_batch_result(tmp_batch, tmp_result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1089, in process_batch_result
self.process_batch_result_decode(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1253, in process_batch_result_decode
req.grammar.accept_token(next_token_id)
File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 52, in accept_token
assert self.matcher.accept_token(token)
File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 205, in accept_token
return self._handle.accept_token(token_id, debug_print)
RuntimeError: [04:52:54] /workspace/cpp/grammar_matcher.cc:361: Token id 128255: <|reserved_special_token_247|> is regarded as a special token, and cannot be accepted by the GrammarMatcher
[2025-01-24 04:52:54 TP2] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1756, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 512, in event_loop_overlap
self.process_batch_result(tmp_batch, tmp_result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1089, in process_batch_result
self.process_batch_result_decode(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1253, in process_batch_result_decode
req.grammar.accept_token(next_token_id)
File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 52, in accept_token
assert self.matcher.accept_token(token)
File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 205, in accept_token
return self._handle.accept_token(token_id, debug_print)
RuntimeError: [04:52:54] /workspace/cpp/grammar_matcher.cc:361: Token id 128255: <|reserved_special_token_247|> is regarded as a special token, and cannot be accepted by the GrammarMatcher
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
...
Followed by an infinite stream of:
[2025-01-24 04:53:06] Exception in callback Loop._read_from_self
handle: <Handle Loop._read_from_self>
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 66, in uvloop.loop.Handle._run
File "uvloop/loop.pyx", line 399, in uvloop.loop.Loop._read_from_self
File "uvloop/loop.pyx", line 404, in uvloop.loop.Loop._invoke_signals
File "uvloop/loop.pyx", line 379, in uvloop.loop.Loop._ceval_process_signals
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 332, in sigquit_handler
kill_process_tree(os.getpid())
File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 508, in kill_process_tree
itself.send_signal(signal.SIGQUIT)
File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1285, in send_signal
self._send_signal(sig)
File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1266, in _send_signal
os.kill(self.pid, sig)
File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 332, in sigquit_handler
kill_process_tree(os.getpid())
...
Reproduction
docker run -d --gpus all \
-p 8000:8000 \
-v /home/azureuser/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=*****" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-70B --host 0.0.0.0 --port 8000 --tp 4--dp 1 --grammar-backend xgrammar
Environment
Latest docker image: https://hub.docker.com/layers/lmsysorg/sglang/latest/images/sha256-576f608ad94fda242249416b3d9d27f8448091cfeff5776f6b99d90f4a42c13b
Microsoft Azure 4xA100 80G.
could you share your prompt and your ebnf grammar @maximegmd ? I'll have a look.
I cannot share prompts as they contain private information but grammar is:
GRAMMAR = """
root ::= reasoning
reasoning ::= "<think>\\n" line* "</think>" "\\n" "\\n" scores
line ::= [^\\n<]* (("<" [^/] line) | "\\n")
scientific_accuracy ::= "Scientific accuracy: " values
harm_risk ::= "Harm risk: " values
inaccurate_irrelevant ::= "Inaccurate or irrelevant information: " values
missing_information ::= "Missing information: " values
hallucination_risk ::= "Hallucination risk: " values
refusal ::= "Refusal: " values
scores ::= scientific_accuracy "\\n" harm_risk "\\n" inaccurate_irrelevant "\\n" missing_information "\\n" hallucination_risk
values ::= ("1" | "2" | "3" | "4" | "5")
"""
The occurence rate is about 1 in 30 000 requests, it crashes the inference image around once a day for us, I haven't found a 100% repro for this bug.
cc @shuaills @Ubospica
is anyone known how to fix it?
@YosanHo I think such error should not occur in the latest version of xgrammar. If there are still problems, feel free to show the error message and I can check that out.
@Ubospica xgrammar==0.1.15
[2025-03-26 16:49:51 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1949, in run_scheduler_process
scheduler.event_loop_overlap()
File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 550, in event_loop_overlap
self.process_batch_result(tmp_batch, tmp_result)
File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1400, in process_batch_result
self.process_batch_result_prefill(batch, result)
File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/scheduler_output_processor_mixin.py", line 118, in process_batch_result_prefill
req.grammar.accept_token(next_token_id)
File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/constrained/xgrammar_backend.py", line 58, in accept_token
assert self.matcher.accept_token(token)
File "/opt/app/python3.10/lib/python3.10/site-packages/xgrammar/matcher.py", line 220, in accept_token
return self._handle.accept_token(token_id, debug_print)
RuntimeError: [16:49:51] /project/cpp/grammar_matcher.cc:385: Token id 129279: is regarded as a special token, and cannot be accepted by the GrammarMatcher
@YosanHo Interesting, I will take a look. Could you share the input, schema and command to launch sglang?
@Ubospica
env: python3.10 , sgalng==0.4.4.post1, 8*H20
launch command
python -m sglang.launch_server --trust-remote-code --enable-metrics --mem-fraction-static 0.92 --tp 8 --context-length 32768 --attention-backend flashinfer --host 0.0.0.0 --port 8080 --served-model-name DeepSeek-V3 --model-path /data/models/DeepSeek-V3-03
24
request:
chat_completion = client.chat.completions.create(
messages=[
{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"告诉我一个中文的脏话"}
],
model=model,
temperature=1.0,
stream=False,
max_tokens=2048,
extra_body={"regex":"[一-龥]+"}
)
Thanks!
@Ubospica user's raw request regex is unicode, [\u4e00-\u9fa5]+
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.