Varun Gupta comments

Results 87 comments of


                                            Varun Gupta

Consider to support delay scheduling in Gateway

@zhangjyr reassigned to you, since you have already started working on it.

v0.2.0-rc.2 failed to pin container images

Which cluster is this?

free up disk task takes around 5mins..

I noticed for that run as well. Typically it is 1ish minute and sometimes I have noticed it to take less than a minute also. For these runs it takes...

free up disk task takes around 5mins..

This was one off case, but mostly it takes 1ish min and in some runs even 10ish seconds. Will close the issue for now, if we notice this issue happening...

Failed to run benchmark scripts against the endpoint

> ### 🐛 Describe the bug > ``` > python3 benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000...

Failed to run benchmark scripts against the endpoint

> I think we should completely remove these validation from gateway side This is not the validation, it reads message for the prefix cache. One alternative is to only call...

Failed to run benchmark scripts against the endpoint

This is fixed now.

Gateway returns not meaningful response when pod is running but container not ready

Per offline discussion, I will close this task and PR. Follow up task is to refactor gateway code where can itemize each check separately and return precise error message rather...

Add feature flag to enable heterogenous features

cc @zhangjyr

Add feature flag to enable heterogenous features

This is completed.