kubeless Scale to Zero

Is this a BUG REPORT or FEATURE REQUEST?:

Feature Request

What happened:

Current autoscaling option uses an HPA, but we have many low-traffic Functions. I'd trade off cold start latency for 0 resource usage under 0 load. I'd propose we turn this off by default.

Knative Serving addresses this but has a hard dependency on Istio which is not an option for my cluster.

What you expected to happen:

When Functions receive 0 traffic for some threshold (e.g. 180 seconds), the HPA pod count scales to 0. When a function has traffic, it defaults to the HPA or static RS behavior.

I'd propose this with a mechanism similar to knative's activator, where all 0-pod Function traffic is routed to an operator which receives the request, scales the Function to a non-0 number of pods, then forwards the original request accordingly. Obviously there will be increased latency on cold starts, and we can mitigate by respecting client timeouts and responding with QoS response codes if requests become a thundering herd.

Dec 17 '18 17:12 jamding

I probably won't have time to implement this for a couple weeks, but I was thinking of the following approach:

minimal prometheus deployment
introduce new CRD with same interface as function.kubeless.io, let's call scalingfunction.kubeless.io
new k8s operator which watches for idleness and marks functions for scale-to-zero
new k8s operator which can receive HTTP requests and marks functions for scale-up

This proposal introduces a couple elements that compose existing kubeless operators rather than to change them directly. Notably, there is no istio dependency.

Scale down:

idler operator watches all functions and using prometheus metrics already exposed on function runtimes, determines whether a function has been idle long enough for a scale down. If so, labels the scalingfunction deployment with the timestamp of the last request
'activatoroperator watches for idle functions, then atomically changes a wrapping svc to point at theactivatoroperator instead of thefunction.kubeless.io`'s deployment svc
either during the svc swap or later some operator changes the function.kubeless.io spec to have a rs of 0 replicas

Scale up:

when activator receives an http request on behalf of a function, it marks the scalingfunction
the activator uses function.kubeless.io as a primitive and replaces the function.kubeless.io deployment corresponding to the scalingfunction deployment with a non-0 rs or hpa
activator received the initial http request, and then forwards the request to a function.kubeless.io svc when the function is ready

This proposal treats the kubeless function CRD as a primitive and proposes composing these scaling operators and CRDs with the existing kubeless ones.

The main disadvantage is no interoperability existing kubeless triggers since it requires having another svc wrap the one the kubeless function controller creates. One proposal is to make those trigger controllers "wrapper-aware"

Jan 07 '19 15:01 jamding

+1 for this.

Cold start using knative is pretty slow (around 6-8 seconds even with prepulled images etc) so it pretty much rules out using it for any user facing API or web endpoints which are called in a synchronous/blocking way.

If cold start on kubeless could be fast enough to satisfy that kind of use case it would be awesome..!

Jan 08 '19 16:01 danielwhatmuff

Thanks for the write up @jamding, a couple of comments on your proposal:

I think we can avoid the prometheus dependency. Functions already expose a /metrics endpoint that the controller can call to retrieve the statistics needed.
We can also avoid having a new CRD, the new controller can watch HorizontalPodAutoscalers items and act if those hpa are associated to a function.

What I don't see so clear is how we can implement the HTTP interceptor. We would need to implement some kind of HTTP gateway to route requests to the function services (and scale up if needed). That can lead to higher response times and slow cold starts.

We can do a POC though to clarify how that may work.

Jan 09 '19 09:01 andresmgot

Yes +1 for this. Cold start would be perfect for some functions, I would love to run a tight cluster until it receives traffic. But even hpa functions for other things I know are dependencies.

Feb 12 '19 22:02 CodeSwimBikeRunner

I think you should have a look at how other people implement scale-to-zero: https://github.com/deislabs/osiris. I stumbled on it in https://github.com/kedacore/keda where they suggest it as an alternative to knative serving.

But if you are worried about coldstart speed, you'd have to move towards a worker-pool.

May 27 '19 15:05 reegnz

Hi! Any updates on this? In my opinion this is a must have for every serverless FaaS tool.

Have anyone considered developing a pool manager like Fission did, or even an idler like OpenFaas?

Mar 06 '20 22:03 delucca

I've noticed that this was merged on July 2019, and is part of Kubernetes release 1.16 https://github.com/kubernetes/kubernetes/pull/74526

It seems to not be GA and requires a command line flag HPAScaleToZero. It also does seem to require custom/external metrics (so I guess using CPU would not work), see https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

I assume this could be used instead of requiring a custom implementation?

Jun 08 '20 20:06 fernandrone

Indeed, once Kubernetes supports it it would be trivial and we wouldn't need to add custom support for this. It would be great if someone could validate this and make it work with Kubeless.

Jun 09 '20 09:06 andresmgot

Do we have any update on this? I heard great things about Kubeless(https://www.appvia.io/blog/serverless-on-kubernetes) but this one issue was listed as a con with no activity.

Jun 28 '21 21:06 santhoshsonti4

Any updates on this?

Jul 01 '21 05:07 Becavalier

Any news on that?

Jul 16 '21 15:07 eduard93

kubeless kubeless copied to clipboard

Scale to Zero

kubeless
kubeless copied to clipboard