Context node crash when using PD Disaggregation
System Info
NVIDIA H20 TensorRT-LLM version: 0.19.0.dev2025041500
Who can help?
No response
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Model LLama70B FP8
- launch disaggregated service follow disaggregated (Set 4 context server :
max_num_tokensto 3501、·max_batch_size· 1、tp_size1) (Set 1 decode server :max_num_tokensto 201、·max_batch_size· 200、tp_size4) - Using sglang to Send the request. (ISL set to 3500)
test_cmd = [
'python3','-m','sglang.bench_serving','--backend','sglang-oai',
'--dataset-name','random',
'--model',f'{model_path}',
'--num-prompt',
'3000',
'--random-input',
'3500',
'--random-output',
'1500',
'--request-rate',
'2.163153705',
'--random-range-ratio','1','--host','localhost',
'--port','9000','--output-file',f'{output_file}'
]
Expected behavior
Disaggregated serving work. Context and Generation server won't crash.
actual behavior
Context server cash when The number of context tokens exceeds the limit value.
additional notes
@Tabrizian any inputs on this?
@nsealati It looks like the number of tokens is larger than expected, could you please double check that the client is sending exactly 3500 tokens to the server. It looks right now it is sending 3533 tokens.
@Tabrizian Thank you for reply. To clarify:
1.I've confirmed the client is indeed sending tokens larger than max_num_tokens.
2.However, the server remains disconnected for subsequent request instead of only rejecting the over-limit request.
Perhaps the correct solution is that only invalid requests are rejected while the server runs normally.
@nsealati , Just checking in~, if this issue is no longer relevant, please let me know so we can close it. If it is still affecting you, could you try the latest version to see if the problem persists?
Issue has not received an update in over 14 days. Adding stale label.
Closing the issue as stale. Please feel free to open a new issue if the problem persists with the latest release. Thank you!