ome [ENHANCEMENT] Integrate KEDA operator to enable advanced autoscaling in OME

What would you like to be added?

Support for integrating the KEDA operator to enable advanced, custom metrics-based autoscaling for OME-managed LLM workloads.

Why is this needed?

Current OME lacks native support for autoscaling based on custom or external metrics. Integrating KEDA will allow OME to:

Dynamically scale model-serving workloads in response to real-time demand.
Optimize GPU and compute resource costs by scaling pods up and down automatically.
Support a wider range of scaling triggers beyond standard CPU/memory metrics (e.g., Prometheus queries, external event sources).
Improve latency and reliability for LLM inference during traffic spikes. This enhancement would provide greater flexibility and operational efficiency for enterprise users deploying LLMs at scale.

Completion requirements

[x] Design doc (if significant feature)
[x] API change
[x] Docs update
[x] Tests

Can you help us implement this enhancement?

[x] Yes, I can contribute
[ ] No, but I'm available for testing
[ ] No

Jul 09 '25 06:07 frankzhouhr

@frankzhouhr I can offer some help. This is a feature we want for a long time.

Jul 11 '25 19:07 YouNeedCryDear

@frankzhouhr just curious where we are with this and if you could share the design doc for this? I am interested in contribution. Happy to collaborate as well 👍🏼 .

Aug 25 '25 03:08 datlife