kaito
kaito copied to clipboard
Optimize inference perf with autoscaling
/assign @rambohe-ch
- A new API: InferenceSet has been added for scaling inference workloads in this pull request https://github.com/kaito-project/kaito/pull/1503 , and Workspace API is considered as an atomic scaling unit.
- https://github.com/kaito-project/keda-kaito-scaler will be updated to work with this new API after InferenceSet supported in kaito.
- A new API: InferenceSet has been added for scaling inference workloads in this pull request docs: Introduce a new InferenceSet CRD and Controller for scaling inference workloads automatically #1503 , and Workspace API is considered as an atomic scaling unit.
- https://github.com/kaito-project/keda-kaito-scaler will be updated to work with this new API after InferenceSet supported in kaito.
it's implemented in v0.8.0 which is going to release this week: https://github.com/kaito-project/kaito/blob/main/website/docs/keda-autoscaler-inference.md