kubeless
kubeless copied to clipboard
Scale to Zero
Is this a BUG REPORT or FEATURE REQUEST?:
Feature Request
What happened:
Current autoscaling option uses an HPA, but we have many low-traffic Functions. I'd trade off cold start latency for 0 resource usage under 0 load. I'd propose we turn this off by default.
Knative Serving addresses this but has a hard dependency on Istio which is not an option for my cluster.
What you expected to happen:
When Functions receive 0 traffic for some threshold (e.g. 180 seconds), the HPA pod count scales to 0. When a function has traffic, it defaults to the HPA or static RS behavior.
I'd propose this with a mechanism similar to knative's activator
, where all 0-pod Function traffic is routed to an operator which receives the request, scales the Function to a non-0 number of pods, then forwards the original request accordingly. Obviously there will be increased latency on cold starts, and we can mitigate by respecting client timeouts and responding with QoS response codes if requests become a thundering herd.
I probably won't have time to implement this for a couple weeks, but I was thinking of the following approach:
- minimal prometheus deployment
- introduce new CRD with same interface as function.kubeless.io, let's call
scalingfunction.kubeless.io
- new k8s operator which watches for idleness and marks functions for scale-to-zero
- new k8s operator which can receive HTTP requests and marks functions for scale-up
This proposal introduces a couple elements that compose existing kubeless operators rather than to change them directly. Notably, there is no istio dependency.
Scale down:
-
idler
operator watches all functions and using prometheus metrics already exposed on function runtimes, determines whether a function has been idle long enough for a scale down. If so, labels thescalingfunction
deployment with the timestamp of the last request - 'activator
operator watches for idle functions, then atomically changes a wrapping svc to point at the
activatoroperator instead of the
function.kubeless.io`'s deployment svc - either during the svc swap or later some operator changes the
function.kubeless.io
spec to have a rs of 0 replicas
Scale up:
- when
activator
receives an http request on behalf of a function, it marks thescalingfunction
- the
activator
usesfunction.kubeless.io
as a primitive and replaces thefunction.kubeless.io
deployment corresponding to thescalingfunction
deployment with a non-0 rs or hpa -
activator
received the initial http request, and then forwards the request to afunction.kubeless.io
svc when the function is ready
This proposal treats the kubeless function CRD as a primitive and proposes composing these scaling operators and CRDs with the existing kubeless ones.
The main disadvantage is no interoperability existing kubeless triggers since it requires having another svc wrap the one the kubeless function controller creates. One proposal is to make those trigger controllers "wrapper-aware"
+1 for this.
Cold start using knative is pretty slow (around 6-8 seconds even with prepulled images etc) so it pretty much rules out using it for any user facing API or web endpoints which are called in a synchronous/blocking way.
If cold start on kubeless could be fast enough to satisfy that kind of use case it would be awesome..!
Thanks for the write up @jamding, a couple of comments on your proposal:
- I think we can avoid the
prometheus
dependency. Functions already expose a/metrics
endpoint that the controller can call to retrieve the statistics needed. - We can also avoid having a new CRD, the new controller can watch
HorizontalPodAutoscaler
s items and act if thosehpa
are associated to a function.
What I don't see so clear is how we can implement the HTTP interceptor. We would need to implement some kind of HTTP gateway to route requests to the function services (and scale up if needed). That can lead to higher response times and slow cold starts.
We can do a POC though to clarify how that may work.
Yes +1 for this. Cold start would be perfect for some functions, I would love to run a tight cluster until it receives traffic. But even hpa functions for other things I know are dependencies.
I think you should have a look at how other people implement scale-to-zero: https://github.com/deislabs/osiris. I stumbled on it in https://github.com/kedacore/keda where they suggest it as an alternative to knative serving.
But if you are worried about coldstart speed, you'd have to move towards a worker-pool.
Hi! Any updates on this? In my opinion this is a must have for every serverless FaaS tool.
Have anyone considered developing a pool manager
like Fission did, or even an idler
like OpenFaas?
I've noticed that this was merged on July 2019, and is part of Kubernetes release 1.16 https://github.com/kubernetes/kubernetes/pull/74526
It seems to not be GA and requires a command line flag HPAScaleToZero
. It also does seem to require custom/external metrics (so I guess using CPU would not work), see https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
I assume this could be used instead of requiring a custom implementation?
Indeed, once Kubernetes supports it it would be trivial and we wouldn't need to add custom support for this. It would be great if someone could validate this and make it work with Kubeless.
Do we have any update on this? I heard great things about Kubeless(https://www.appvia.io/blog/serverless-on-kubernetes) but this one issue was listed as a con with no activity.
Any updates on this?
Any news on that?