Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

Autoscaling Benchmark Initial

@happyandslow you are supposed to write down the prerequisite to run scripts. Let's remove such dataset in the PR, only scripts to generate those PR are needed

Autoscaling Benchmark Initial

@happyandslow is this PR still necessary?

imbalance issues found in least-of-request or any other policies.

One approach is to design something more sophisticated, like least-effective-load(LEL) solution. Compute effective load `𝐿𝑖 =𝑅𝑖 +𝛼𝑃𝑖` for each pod. R - running , P - pending. We can start...

imbalance issues found in least-of-request or any other policies.

I think we should invest the client problem first. That could be the primary reason. I had a brief chat with @happyandslow today and she confirms that part

imbalance issues found in least-of-request or any other policies.

@gangmuk We can expose the target-pod in the response header from gateway end, that could be helpful

imbalance issues found in least-of-request or any other policies.

for successfully logs, I think we already have it. check this https://github.com/aibrix/aibrix/blob/c4060bb3c5d41949954626f16c0ae15aa82b73ec/test/e2e/routing_strategy_test.go#L69

imbalance issues found in least-of-request or any other policies.

@varungup90 any evidence shows the imbalance issue is due to 50ms?

imbalance issues found in least-of-request or any other policies.

@varungup90 we have some issue to track the client request interval optimization https://github.com/vllm-project/aibrix/issues/667. It is supposed not to send request in batch way. If QPS=8, in even distribution, only single...

jinja2.exceptions.UndefinedError: 'bos_token' is undefined

![Image](https://github.com/user-attachments/assets/51194720-34b7-41b4-9323-80413ccd8d01) I change to absolute path and it's still not working. the lora adapter issue