Jiaxin Shan
Jiaxin Shan
@happyandslow you are supposed to write down the prerequisite to run scripts. Let's remove such dataset in the PR, only scripts to generate those PR are needed
@happyandslow is this PR still needed?
@happyandslow is this PR still necessary?
One approach is to design something more sophisticated, like least-effective-load(LEL) solution. Compute effective load `πΏπ =π π +πΌππ` for each pod. R - running , P - pending. We can start...
I think we should invest the client problem first. That could be the primary reason. I had a brief chat with @happyandslow today and she confirms that part
@gangmuk We can expose the target-pod in the response header from gateway end, that could be helpful
for successfully logs, I think we already have it. check this https://github.com/aibrix/aibrix/blob/c4060bb3c5d41949954626f16c0ae15aa82b73ec/test/e2e/routing_strategy_test.go#L69
@varungup90 any evidence shows the imbalance issue is due to 50ms?
@varungup90 we have some issue to track the client request interval optimization https://github.com/vllm-project/aibrix/issues/667. It is supposed not to send request in batch way. If QPS=8, in even distribution, only single...
 I change to absolute path and it's still not working. the lora adapter issue