serving
serving copied to clipboard
Implement intelligent autoscaling
Describe the feature
KPA gathers statistics via a moving average across pod replicas given a time window. I am wondering if we could provide something smarter and also deal with some cold start issues eg. don't scale down to zero if a traffic burst is about to happen. scale-down-delay
keeps around the maximum desired pod count within a window but probably we need to look ahead in time to make sure we have enough capacity as pods may take time to scale out (depends on the app), affecting latency.
This could be implemented as knative-extension as Knative services could be updated externally (no need to change kpa).
There is a lot of history on the topic, see [1] for more. This feature is already offered, for example at the node level, by cloud providers, see [2]. See also the KEDA related issue [3]. I am creating this issue also as a ref for future discussions in case there is interest from the community.
Refs
[1] Lucia Schuler, Somaya Jamil, Niklas Kühl, AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments. [2] Predictive scaling for Amazon EC2 Auto Scaling [3] https://github.com/kedacore/keda/issues/2401
cc @dprotaso @ReToCode
We also experience the issues mentioned here. I was initially hoping to integrate some redundancy option, so that I could always add x pods to the deployment on top of what kpa predicts. But I would much rather like some predictive scaling or options for also integrating cyclical workloads or similar.
As a first step for me, could I integrate this redundancy as a knative-extension and deploy it myself? Are there guides for doing that?
Help is much appreciated!
@Hojland You can implement your own Autoscaling algorithm in Knative, then just recompile it and deploy the different Autoscaler container image.
This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen
. Mark the issue as
fresh by adding the comment /remove-lifecycle stale
.