ome icon indicating copy to clipboard operation
ome copied to clipboard

[ENHANCEMENT] Integrate KEDA operator to enable advanced autoscaling in OME

Open frankzhouhr opened this issue 5 months ago • 2 comments

What would you like to be added?

Support for integrating the KEDA operator to enable advanced, custom metrics-based autoscaling for OME-managed LLM workloads.

Why is this needed?

Current OME lacks native support for autoscaling based on custom or external metrics. Integrating KEDA will allow OME to:

  • Dynamically scale model-serving workloads in response to real-time demand.
  • Optimize GPU and compute resource costs by scaling pods up and down automatically.
  • Support a wider range of scaling triggers beyond standard CPU/memory metrics (e.g., Prometheus queries, external event sources).
  • Improve latency and reliability for LLM inference during traffic spikes. This enhancement would provide greater flexibility and operational efficiency for enterprise users deploying LLMs at scale.

Completion requirements

  • [x] Design doc (if significant feature)
  • [x] API change
  • [x] Docs update
  • [x] Tests

Can you help us implement this enhancement?

  • [x] Yes, I can contribute
  • [ ] No, but I'm available for testing
  • [ ] No

frankzhouhr avatar Jul 09 '25 06:07 frankzhouhr

@frankzhouhr I can offer some help. This is a feature we want for a long time.

YouNeedCryDear avatar Jul 11 '25 19:07 YouNeedCryDear

@frankzhouhr just curious where we are with this and if you could share the design doc for this? I am interested in contribution. Happy to collaborate as well 👍🏼 .

datlife avatar Aug 25 '25 03:08 datlife