Load the qwen2_5_insturcut_7b model, because the deployment request response using sglang is slower than that of vllm deployment. When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable?

Open qingzhong1 opened this issue 1 year ago • 1 comments

Dec 24 '24 06:12 qingzhong1

Sorry I don't quite understand the question.

because the deployment request response using sglang is slower than that of vllm deployment.

Do you mean warmup time, or evaluations metrics such as ITL or TTFT?

When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable?

what's the purpose of printing these arguments?

Dec 24 '24 06:12 yzh119

Old issue that needs more input from @qingzhong1, closing for now. Feel free to re-open if this is still an issue.

Sep 30 '25 02:09 sricketts