kaito icon indicating copy to clipboard operation
kaito copied to clipboard

Optimize inference perf with autoscaling

Open sdesai345 opened this issue 7 months ago • 2 comments

sdesai345 avatar May 07 '25 16:05 sdesai345

/assign @rambohe-ch

rambohe-ch avatar Jul 22 '25 23:07 rambohe-ch

  1. A new API: InferenceSet has been added for scaling inference workloads in this pull request https://github.com/kaito-project/kaito/pull/1503 , and Workspace API is considered as an atomic scaling unit.
  2. https://github.com/kaito-project/keda-kaito-scaler will be updated to work with this new API after InferenceSet supported in kaito.

rambohe-ch avatar Oct 07 '25 22:10 rambohe-ch

  1. A new API: InferenceSet has been added for scaling inference workloads in this pull request docs: Introduce a new InferenceSet CRD and Controller for scaling inference workloads automatically #1503 , and Workspace API is considered as an atomic scaling unit.
  2. https://github.com/kaito-project/keda-kaito-scaler will be updated to work with this new API after InferenceSet supported in kaito.

it's implemented in v0.8.0 which is going to release this week: https://github.com/kaito-project/kaito/blob/main/website/docs/keda-autoscaler-inference.md

andyzhangx avatar Dec 15 '25 07:12 andyzhangx