aibrix
aibrix copied to clipboard
Designing an effective metric to identify imbalances and measure distribution fairness
🚀 Feature Description and Motivation
metrics: requests, tokens (prefill, decode), latencies(e2e, TTFT, TPOT), resources (SM_ACTIVE)
measurement:
- request per pod
- standard deviation of requests
- gini coefficient
Recently, we meet a lot of issues measuring the load balance issues. Beside the bugs we fixed, we notice it's a little bit hard to figure out the some deep problem. To address this issue, I suggest to improve the metrics measurement for better evaluating the performance.
Use Case
No response
Proposed Solution
No response