sglang icon indicating copy to clipboard operation
sglang copied to clipboard

deepseek-R1 AssertionError occurred in the batch request of the client

Open Roysky opened this issue 2 weeks ago • 4 comments

While using deepseek-R1 for inference on 2 nodes * 8 GPUs (H800), an AssertionError occurred during the client batch request.

The specific error is as follows:

[2025-02-11 01:42:04] INFO: 10.81.10.40:51432 - "GET /v1/batches/batch_2c036fce-9c71-4d76-9fdb-4701d9f59861 HTTP/1.1" 200 OK [2025-02-11 01:42:04] DetokenizerManager hit an exception: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 240, in run_detokenizer_process manager.event_loop() File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 143, in event_loop self.trim_matched_stop( File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 105, in trim_matched_stop assert len(output) > 0 AssertionError [2025-02-11 01:42:04] Received sigquit from a child proces. It usually means the child failed.

The environment configuration is as follows:

  • sglang version: 0.4.2.post3
  • env: 2 nodes * H800(8gpus)

Startup command:

node1

python -m sglang.launch_server --model-path DeepSeek-R1 --tp 16 --nccl-init-addr 10.1.10.42:5000 --nnodes 2 --node-rank 0 --trust-remote-code --host 0.0.0.0

node2

python -m sglang.launch_server --model-path DeepSeek-R1 --tp 16 --nccl-init-addr 10.1.10.42:5000 --nnodes 2 --node-rank 1 --trust-remote-code --host 0.0.0.0

Roysky avatar Feb 11 '25 06:02 Roysky