Varun Gupta
Varun Gupta
@ying2025 @dittops is this still an open issue?
I close this issue as stale.
Imbalance issue with least-request is due to delay in metric refresh (default 50ms) and it is prone to batch requests in short duration. This PR: https://github.com/vllm-project/aibrix/pull/918 introduced tracking running requests...
@Jeffwan Yea, I found this example during the benchmark testing. By default benchmark testing runs with 8 threads. T0: initiate 8 requests concurrently, and requests are dispatched in this order...
@firebook I haven't got the opportunity to work on it yet. It is in my high priority list. If you are interested, please take the lead to propose implementation design,...
@vivekrsintc Can you share the output of `kubectl describe httproute -A` and `kubectl describe envoyextensionpolicy -A`
They both look good. I could not find you on aibrix slack channel, can you ping me on slack.
I will close this task, please create a new one for current release v0.3.0.
To unblock you I am adding the details here, will add the document. - We have a separate metadata service, so it needs separate port forwarding. WIP to add under...
> In the above create user how the the Authentication key managed. How to assign each user with authentication key. Authentication key present in the request is for the model,...