sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Bug] DeepSeek R1 cannot run if using 16K input

Open Wesley-Jzy opened this issue 11 months ago • 5 comments

Checklist

  • [ ] 1. I have searched related issues but cannot get the expected help.
  • [ ] 2. The bug has not been fixed in the latest version.
  • [ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [ ] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • [ ] 5. Please use English, otherwise it will be closed.

Describe the bug

python3 -m sglang.launch_server --model-path xxxxx/DeepSeek-R1 --tp 16 --dist-init-addr $MASTER_ADDR:12345 --nnodes 2 --node-rank $RANK --trust-remote-code --host 0.0.0.0 --context-length 16384

Then give a 10k input to it with curl API. Then the engine died with log

[2025-02-03 19:51:06 TP0] Prefill batch. #new-seq: 1, #new-token: 6748, #cached-token: 1, cache hit rate: 0.01%, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-02-03 19:51:06 TP0] Prefill batch. #new-seq: 2, #new-token: 8192, #cached-token: 2, cache hit rate: 0.02%, token usage: 0.02, #running-req: 1, #queue-req: 0
[2025-02-03 19:51:15 TP0] Prefill batch. #new-seq: 2, #new-token: 8192, #cached-token: 1563, cache hit rate: 6.34%, token usage: 0.04, #running-req: 2, #queue-req: 47
[2025-02-03 19:51:15 TP0] Prefill batch. #new-seq: 3, #new-token: 8192, #cached-token: 3014, cache hit rate: 12.75%, token usage: 0.07, #running-req: 3, #queue-req: 45
[2025-02-03 19:51:18 TP0] Prefill batch. #new-seq: 3, #new-token: 8192, #cached-token: 3168, cache hit rate: 16.39%, token usage: 0.09, #running-req: 5, #queue-req: 43
[2025-02-03 19:51:18 TP0] Prefill batch. #new-seq: 2, #new-token: 8192, #cached-token: 2860, cache hit rate: 18.19%, token usage: 0.12, #running-req: 7, #queue-req: 42
[2025-02-03 19:51:20 TP0] Prefill batch. #new-seq: 3, #new-token: 8192, #cached-token: 3059, cache hit rate: 19.64%, token usage: 0.14, #running-req: 8, #queue-req: 40
[2025-02-03 19:51:21 TP0] Prefill batch. #new-seq: 3, #new-token: 8192, #cached-token: 3216, cache hit rate: 20.85%, token usage: 0.17, #running-req: 10, #queue-req: 38
[2025-02-03 19:51:23 TP0] Prefill batch. #new-seq: 2, #new-token: 8192, #cached-token: 1608, cache hit rate: 20.37%, token usage: 0.20, #running-req: 12, #queue-req: 37
[2025-02-03 19:51:24 TP0] Prefill batch. #new-seq: 3, #new-token: 8192, #cached-token: 3014, cache hit rate: 21.09%, token usage: 0.22, #running-req: 13, #queue-req: 35
[2025-02-03 19:51:24 TP0] Prefill batch. #new-seq: 2, #new-token: 8192, #cached-token: 1503, cache hit rate: 20.60%, token usage: 0.25, #running-req: 15, #queue-req: 34
[2025-02-03 19:51:26 TP0] Prefill batch. #new-seq: 3, #new-token: 8192, #cached-token: 3004, cache hit rate: 21.17%, token usage: 0.27, #running-req: 16, #queue-req: 32
[2025-02-03 19:51:27 TP0] Prefill batch. #new-seq: 4, #new-token: 8192, #cached-token: 10410, cache hit rate: 25.74%, token usage: 0.30, #running-req: 18, #queue-req: 29
[2025-02-03 19:51:29 TP0] Prefill batch. #new-seq: 4, #new-token: 8192, #cached-token: 10410, cache hit rate: 29.25%, token usage: 0.33, #running-req: 21, #queue-req: 26
[2025-02-03 19:51:30 TP0] Prefill batch. #new-seq: 4, #new-token: 8192, #cached-token: 7342, cache hit rate: 30.85%, token usage: 0.35, #running-req: 24, #queue-req: 23
[2025-02-03 19:51:31 TP0] Prefill batch. #new-seq: 2, #new-token: 8192, #cached-token: 1039, cache hit rate: 29.87%, token usage: 0.38, #running-req: 27, #queue-req: 22
[2025-02-03 19:51:32 TP0] Prefill batch. #new-seq: 2, #new-token: 8192, #cached-token: 1038, cache hit rate: 28.98%, token usage: 0.41, #running-req: 28, #queue-req: 21
[2025-02-03 19:51:33 TP0] Prefill batch. #new-seq: 3, #new-token: 8192, #cached-token: 1490, cache hit rate: 28.34%, token usage: 0.43, #running-req: 29, #queue-req: 19
[2025-02-03 19:58:05 TP2] Watchdog timeout (self.watchdog_timeout=300)
[2025-02-03 19:58:05 TP6] Watchdog timeout (self.watchdog_timeout=300)
[2025-02-03 19:58:05 TP1] Watchdog timeout (self.watchdog_timeout=300)
[2025-02-03 19:58:05 TP0] Watchdog timeout (self.watchdog_timeout=300)
[2025-02-03 19:58:05 TP3] Watchdog timeout (self.watchdog_timeout=300)
[2025-02-03 19:58:06 TP4] Watchdog timeout (self.watchdog_timeout=300)
[2025-02-03 19:58:06 TP7] Watchdog timeout (self.watchdog_timeout=300)
[2025-02-03 19:58:06 TP5] Watchdog timeout (self.watchdog_timeout=300)
[2025-02-03 19:58:10] Received sigquit from a child proces. It usually means the child failed.
/bin/bash: line 2:    72 Killed                  python3 -m sglang.launch_server --model-path xxxxxx/DeepSeek-R1 --tp 16 --dist-init-addr $MASTER_ADDR:12345 --nnodes 2 --node-rank $RANK --trust-remote-code --host 0.0.0.0 --context-length 16384

Reproduction

server

python3 -m sglang.launch_server --model-path xxxxx/DeepSeek-R1 --tp 16 --dist-init-addr $MASTER_ADDR:12345 --nnodes 2 --node-rank $RANK --trust-remote-code --host 0.0.0.0 --context-length 16384

client

# concurrency run with 20 threads
curl it with long input

Environment

No

Wesley-Jzy avatar Feb 04 '25 04:02 Wesley-Jzy

cc @zhyncs @ispobock

zhaochenyang20 avatar Feb 04 '25 07:02 zhaochenyang20

@zhaochenyang20 It continuously happens when I deploy it online. I'm trying reproduce it with a smaller example. If you have some suggestions, please tell me :)

Wesley-Jzy avatar Feb 06 '25 23:02 Wesley-Jzy

@we Today we will update a new version for flashinfer backend, with much more powerful long context. Stay tuned!

zhaochenyang20 avatar Feb 07 '25 05:02 zhaochenyang20

@we Today we will update a new version for flashinfer backend, with much more powerful long context. Stay tuned!

I think you are right. I check the error log again. It's a floating point exception before timeout. May I know how flash infer causes this crash? And which commit is connected to the update?

Wesley-Jzy avatar Feb 13 '25 17:02 Wesley-Jzy

@Wesley-Jzy See the new flashinfer version will come out in these two days. Especailly useful for long context.

zhaochenyang20 avatar Feb 14 '25 00:02 zhaochenyang20

This bug seems to be solved according to #3424 and #3836.

Fridge003 avatar Feb 28 '25 08:02 Fridge003