sglang
sglang copied to clipboard
deepseek-R1 AssertionError occurred in the batch request of the client
While using deepseek-R1 for inference on 2 nodes * 8 GPUs (H800), an AssertionError occurred during the client batch request.
The specific error is as follows:
[2025-02-11 01:42:04] INFO: 10.81.10.40:51432 - "GET /v1/batches/batch_2c036fce-9c71-4d76-9fdb-4701d9f59861 HTTP/1.1" 200 OK [2025-02-11 01:42:04] DetokenizerManager hit an exception: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 240, in run_detokenizer_process manager.event_loop() File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 143, in event_loop self.trim_matched_stop( File "/usr/local/lib/python3.10/dist-packages/sglang/srt/managers/detokenizer_manager.py", line 105, in trim_matched_stop assert len(output) > 0 AssertionError [2025-02-11 01:42:04] Received sigquit from a child proces. It usually means the child failed.
The environment configuration is as follows:
- sglang version: 0.4.2.post3
- env: 2 nodes * H800(8gpus)
Startup command:
node1
python -m sglang.launch_server --model-path DeepSeek-R1 --tp 16 --nccl-init-addr 10.1.10.42:5000 --nnodes 2 --node-rank 0 --trust-remote-code --host 0.0.0.0
node2
python -m sglang.launch_server --model-path DeepSeek-R1 --tp 16 --nccl-init-addr 10.1.10.42:5000 --nnodes 2 --node-rank 1 --trust-remote-code --host 0.0.0.0