sglang [Bug] Crash special token xgrammar

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[x] 2. The bug has not been fixed in the latest version.
[x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
[x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
[x] 5. Please use English, otherwise it will be closed.

Describe the bug

When using xgrammar with an EBNF grammar, SGLang will crash if the model outputs a reserved token.

[2025-01-24 04:52:54 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1756, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 512, in event_loop_overlap
    self.process_batch_result(tmp_batch, tmp_result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1089, in process_batch_result
    self.process_batch_result_decode(batch, result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1253, in process_batch_result_decode
    req.grammar.accept_token(next_token_id)
  File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 52, in accept_token
    assert self.matcher.accept_token(token)
  File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 205, in accept_token
    return self._handle.accept_token(token_id, debug_print)
RuntimeError: [04:52:54] /workspace/cpp/grammar_matcher.cc:361: Token id 128255: <|reserved_special_token_247|> is regarded as a special token, and cannot be accepted by the GrammarMatcher


[2025-01-24 04:52:54 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1756, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 512, in event_loop_overlap
    self.process_batch_result(tmp_batch, tmp_result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1089, in process_batch_result
    self.process_batch_result_decode(batch, result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1253, in process_batch_result_decode
    req.grammar.accept_token(next_token_id)
  File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 52, in accept_token
    assert self.matcher.accept_token(token)
  File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 205, in accept_token
    return self._handle.accept_token(token_id, debug_print)
RuntimeError: [04:52:54] /workspace/cpp/grammar_matcher.cc:361: Token id 128255: <|reserved_special_token_247|> is regarded as a special token, and cannot be accepted by the GrammarMatcher


[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
...

Followed by an infinite stream of:

[2025-01-24 04:53:06] Exception in callback Loop._read_from_self
handle: <Handle Loop._read_from_self>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 66, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 399, in uvloop.loop.Loop._read_from_self
  File "uvloop/loop.pyx", line 404, in uvloop.loop.Loop._invoke_signals
  File "uvloop/loop.pyx", line 379, in uvloop.loop.Loop._ceval_process_signals
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 332, in sigquit_handler
    kill_process_tree(os.getpid())
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 508, in kill_process_tree
    itself.send_signal(signal.SIGQUIT)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1285, in send_signal
    self._send_signal(sig)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1266, in _send_signal
    os.kill(self.pid, sig)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 332, in sigquit_handler
    kill_process_tree(os.getpid())
...

Reproduction

docker run -d --gpus all \
    -p 8000:8000 \
    -v /home/azureuser/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=*****" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-70B --host 0.0.0.0 --port 8000 --tp 4--dp 1 --grammar-backend xgrammar

Environment

Latest docker image: https://hub.docker.com/layers/lmsysorg/sglang/latest/images/sha256-576f608ad94fda242249416b3d9d27f8448091cfeff5776f6b99d90f4a42c13b

Microsoft Azure 4xA100 80G.

Jan 24 '25 13:01 maximegmd

could you share your prompt and your ebnf grammar @maximegmd ? I'll have a look.

Jan 24 '25 16:01 adarshxs

I cannot share prompts as they contain private information but grammar is:

GRAMMAR = """
root ::= reasoning
reasoning ::= "<think>\\n" line* "</think>" "\\n" "\\n" scores
line ::= [^\\n<]* (("<" [^/] line) | "\\n")
scientific_accuracy ::= "Scientific accuracy: " values
harm_risk ::= "Harm risk: " values
inaccurate_irrelevant ::= "Inaccurate or irrelevant information: " values
missing_information ::= "Missing information: " values
hallucination_risk ::= "Hallucination risk: " values
refusal ::= "Refusal: " values
scores ::= scientific_accuracy "\\n" harm_risk "\\n" inaccurate_irrelevant "\\n" missing_information "\\n" hallucination_risk
values ::= ("1" | "2" | "3" | "4" | "5")
"""

The occurence rate is about 1 in 30 000 requests, it crashes the inference image around once a day for us, I haven't found a 100% repro for this bug.

Jan 24 '25 20:01 maximegmd

cc @shuaills @Ubospica

Jan 25 '25 04:01 zhaochenyang20

is anyone known how to fix it?

Mar 26 '25 08:03 YosanHo

@YosanHo I think such error should not occur in the latest version of xgrammar. If there are still problems, feel free to show the error message and I can check that out.

Mar 26 '25 08:03 Ubospica

@Ubospica xgrammar==0.1.15

[2025-03-26 16:49:51 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1949, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 550, in event_loop_overlap
    self.process_batch_result(tmp_batch, tmp_result)
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/scheduler.py", line 1400, in process_batch_result
    self.process_batch_result_prefill(batch, result)
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/scheduler_output_processor_mixin.py", line 118, in process_batch_result_prefill
    req.grammar.accept_token(next_token_id)
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/constrained/xgrammar_backend.py", line 58, in accept_token
    assert self.matcher.accept_token(token)
  File "/opt/app/python3.10/lib/python3.10/site-packages/xgrammar/matcher.py", line 220, in accept_token
    return self._handle.accept_token(token_id, debug_print)
RuntimeError: [16:49:51] /project/cpp/grammar_matcher.cc:385: Token id 129279:  is regarded as a special token, and cannot be accepted by the GrammarMatcher

Mar 26 '25 08:03 YosanHo

@YosanHo Interesting, I will take a look. Could you share the input, schema and command to launch sglang?

Mar 26 '25 09:03 Ubospica

@Ubospica

env: python3.10 , sgalng==0.4.4.post1, 8*H20

launch command

python -m sglang.launch_server --trust-remote-code --enable-metrics  --mem-fraction-static 0.92 --tp 8 --context-length 32768 --attention-backend flashinfer --host  0.0.0.0 --port 8080 --served-model-name DeepSeek-V3 --model-path /data/models/DeepSeek-V3-03
24

request:

chat_completion = client.chat.completions.create(
    messages=[
        {"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"告诉我一个中文的脏话"}
    ],
    model=model,
    temperature=1.0,
    stream=False,
    max_tokens=2048,
    extra_body={"regex":"[一-龥]+"}
)

Mar 26 '25 09:03 YosanHo

Thanks!

Mar 26 '25 09:03 Ubospica

@Ubospica user's raw request regex is unicode, [\u4e00-\u9fa5]+

Mar 26 '25 09:03 YosanHo

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

May 26 '25 00:05 github-actions[bot]