aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

[RFC] Deliver stable, feasible, and smooth output for GPU Optimizer

Open nwangfw opened this issue 3 weeks ago • 0 comments

🚀 Feature Description and Motivation

Based on the experiments conducted so far, we have identified the following issues that need to be addressed to ensure the GPU optimizer fully working in a production environment:

  1. The GPU optimizer fluctuates its output even on a relatively stable bi-pattern workload (a mix of ShareGPT and Text2SQL datasets).
  2. The GPU optimizer’s output may be infeasible as it does not account for GPU constraints. For example, it may return configurations like {4A10 and 2L20}, which is not feasible since there is only 1 L20 available in the cluster. The gpu-optimizer should return a feasible result instead based on the cluster gpu availability, such as {7A10 and 1L20}.
  3. We found that scaling down one type of GPU is faster than scaling up another type of GPU, which leads to insufficient GPU resource during the transaction period and SLO violation.

To resolve these issues, we plan to:

  1. We need to improve the data monitoring and data loading implementation to provide more stable input for ILP.
  2. We need to avoid huge ILP output change in two consecutive decision windows (e.g., 4L20 to 5a10).
  3. We also need to incorporate GPU constraints into the ILP formulation.
  4. We need to enforce a rolling update policy during the GPU combination transition period. For example, if gpu-optimizer outputs a new strategy (+L20 and -A10), we need to scale up L20 first and delay scaling down a10.

Use Case

No response

Proposed Solution

No response

nwangfw avatar Feb 06 '25 23:02 nwangfw