serving Implement intelligent autoscaling

Describe the feature

KPA gathers statistics via a moving average across pod replicas given a time window. I am wondering if we could provide something smarter and also deal with some cold start issues eg. don't scale down to zero if a traffic burst is about to happen. scale-down-delay keeps around the maximum desired pod count within a window but probably we need to look ahead in time to make sure we have enough capacity as pods may take time to scale out (depends on the app), affecting latency. This could be implemented as knative-extension as Knative services could be updated externally (no need to change kpa). There is a lot of history on the topic, see [1] for more. This feature is already offered, for example at the node level, by cloud providers, see [2]. See also the KEDA related issue [3]. I am creating this issue also as a ref for future discussions in case there is interest from the community.

Refs

[1] Lucia Schuler, Somaya Jamil, Niklas Kühl, AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments. [2] Predictive scaling for Amazon EC2 Auto Scaling [3] https://github.com/kedacore/keda/issues/2401

cc @dprotaso @ReToCode

Apr 01 '24 08:04 skonto

We also experience the issues mentioned here. I was initially hoping to integrate some redundancy option, so that I could always add x pods to the deployment on top of what kpa predicts. But I would much rather like some predictive scaling or options for also integrating cyclical workloads or similar.

As a first step for me, could I integrate this redundancy as a knative-extension and deploy it myself? Are there guides for doing that?

Help is much appreciated!

Apr 05 '24 07:04 Hojland

@Hojland You can implement your own Autoscaling algorithm in Knative, then just recompile it and deploy the different Autoscaler container image.

Apr 23 '24 09:04 Lightxyz

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Aug 21 '24 01:08 github-actions[bot]

serving serving copied to clipboard

Implement intelligent autoscaling

Describe the feature

Refs

serving
serving copied to clipboard