flashinfer icon indicating copy to clipboard operation
flashinfer copied to clipboard

Load the qwen2_5_insturcut_7b model, because the deployment request response using sglang is slower than that of vllm deployment. When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable?

Open qingzhong1 opened this issue 1 year ago • 1 comments

image

qingzhong1 avatar Dec 24 '24 06:12 qingzhong1

Sorry I don't quite understand the question.

because the deployment request response using sglang is slower than that of vllm deployment.

Do you mean warmup time, or evaluations metrics such as ITL or TTFT?

When troubleshooting, print("mutates_args",mutates_args,"schema_str",schema_str) is this result reasonable?

what's the purpose of printing these arguments?

yzh119 avatar Dec 24 '24 06:12 yzh119

Old issue that needs more input from @qingzhong1, closing for now. Feel free to re-open if this is still an issue.

sricketts avatar Sep 30 '25 02:09 sricketts