Varun Gupta

Results 87 comments of Varun Gupta

@ying2025 @dittops is this still an open issue?

Imbalance issue with least-request is due to delay in metric refresh (default 50ms) and it is prone to batch requests in short duration. This PR: https://github.com/vllm-project/aibrix/pull/918 introduced tracking running requests...

@Jeffwan Yea, I found this example during the benchmark testing. By default benchmark testing runs with 8 threads. T0: initiate 8 requests concurrently, and requests are dispatched in this order...

@firebook I haven't got the opportunity to work on it yet. It is in my high priority list. If you are interested, please take the lead to propose implementation design,...

@vivekrsintc Can you share the output of `kubectl describe httproute -A` and `kubectl describe envoyextensionpolicy -A`

They both look good. I could not find you on aibrix slack channel, can you ping me on slack.

I will close this task, please create a new one for current release v0.3.0.

To unblock you I am adding the details here, will add the document. - We have a separate metadata service, so it needs separate port forwarding. WIP to add under...

> In the above create user how the the Authentication key managed. How to assign each user with authentication key. Authentication key present in the request is for the model,...