TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Context node crash when using PD Disaggregation

Open nsealati opened this issue 8 months ago • 1 comments

System Info

NVIDIA H20 TensorRT-LLM version: 0.19.0.dev2025041500

Who can help?

No response

Information

  • [x] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

  1. Model LLama70B FP8
  2. launch disaggregated service follow disaggregated (Set 4 context server :max_num_tokens to 3501、·max_batch_size· 1、tp_size 1) (Set 1 decode server :max_num_tokens to 201、·max_batch_size· 200、tp_size 4)
  3. Using sglang to Send the request. (ISL set to 3500)
test_cmd = [
'python3','-m','sglang.bench_serving','--backend','sglang-oai',
'--dataset-name','random',
'--model',f'{model_path}',
'--num-prompt', 
'3000',
'--random-input',
'3500',
'--random-output',
'1500',
'--request-rate',
'2.163153705',
'--random-range-ratio','1','--host','localhost',
'--port','9000','--output-file',f'{output_file}'
]

Expected behavior

Disaggregated serving work. Context and Generation server won't crash.

actual behavior

Context server cash when The number of context tokens exceeds the limit value.

additional notes

Image

nsealati avatar Apr 29 '25 07:04 nsealati

@Tabrizian any inputs on this?

brb-nv avatar May 16 '25 23:05 brb-nv

@nsealati It looks like the number of tokens is larger than expected, could you please double check that the client is sending exactly 3500 tokens to the server. It looks right now it is sending 3533 tokens.

Tabrizian avatar May 19 '25 15:05 Tabrizian

@Tabrizian Thank you for reply. To clarify: 1.I've confirmed the client is indeed sending tokens larger than max_num_tokens. 2.However, the server remains disconnected for subsequent request instead of only rejecting the over-limit request. Perhaps the correct solution is that only invalid requests are rejected while the server runs normally.

nsealati avatar May 20 '25 02:05 nsealati

@nsealati , Just checking in~, if this issue is no longer relevant, please let me know so we can close it. If it is still affecting you, could you try the latest version to see if the problem persists?

karljang avatar Nov 12 '25 02:11 karljang

Issue has not received an update in over 14 days. Adding stale label.

github-actions[bot] avatar Nov 26 '25 03:11 github-actions[bot]

Closing the issue as stale. Please feel free to open a new issue if the problem persists with the latest release. Thank you!

karljang avatar Dec 03 '25 06:12 karljang