ome
ome copied to clipboard
[ENHANCEMENT] Integrate KEDA operator to enable advanced autoscaling in OME
What would you like to be added?
Support for integrating the KEDA operator to enable advanced, custom metrics-based autoscaling for OME-managed LLM workloads.
Why is this needed?
Current OME lacks native support for autoscaling based on custom or external metrics. Integrating KEDA will allow OME to:
- Dynamically scale model-serving workloads in response to real-time demand.
- Optimize GPU and compute resource costs by scaling pods up and down automatically.
- Support a wider range of scaling triggers beyond standard CPU/memory metrics (e.g., Prometheus queries, external event sources).
- Improve latency and reliability for LLM inference during traffic spikes. This enhancement would provide greater flexibility and operational efficiency for enterprise users deploying LLMs at scale.
Completion requirements
- [x] Design doc (if significant feature)
- [x] API change
- [x] Docs update
- [x] Tests
Can you help us implement this enhancement?
- [x] Yes, I can contribute
- [ ] No, but I'm available for testing
- [ ] No
@frankzhouhr I can offer some help. This is a feature we want for a long time.
@frankzhouhr just curious where we are with this and if you could share the design doc for this? I am interested in contribution. Happy to collaborate as well 👍🏼 .