Varun Gupta
Varun Gupta
@zhangjyr reassigned to you, since you have already started working on it.
Which cluster is this?
I noticed for that run as well. Typically it is 1ish minute and sometimes I have noticed it to take less than a minute also. For these runs it takes...
This was one off case, but mostly it takes 1ish min and in some runs even 10ish seconds. Will close the issue for now, if we notice this issue happening...
> ### 🐛 Describe the bug > ``` > python3 benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000...
> I think we should completely remove these validation from gateway side This is not the validation, it reads message for the prefix cache. One alternative is to only call...
This is fixed now.
Per offline discussion, I will close this task and PR. Follow up task is to refactor gateway code where can itemize each check separately and return precise error message rather...
cc @zhangjyr
This is completed.