serving icon indicating copy to clipboard operation
serving copied to clipboard

Is there a way to specify a minimum, non-zero scaling value while keeping scale-to-zero behavior?

Open danieltahara opened this issue 4 years ago • 23 comments
trafficstars

In what area(s)?

/area autoscale

Other classifications:

/kind good-first-issue /kind process /kind spec

Ask your question here:

We have a few very "bursty" services that get zero traffic for a while and then get 100s of concurrent requests.

What I would like is to be able to do the following:

  1. Enable scale-to-zero for that service
  2. When the service has a non-zero scale, it is at least some minimum value (say, 10).

As far as I understand, setting minScale to 10 prevents scale to zero, and initialScale only applies on first deployment.

Is there some other way (directly via config) or pattern (i.e. workaround) for doing this?

Thanks!

danieltahara avatar May 06 '21 20:05 danieltahara

Interesting! We don't support this currently as you've found out but it is and interesting thought.

Do you see too long of a delay in scaling up to the required scale in your workloads? If all requests come in at once, I'd expect us to scale to, say, 10 almost immediately. If you have a ramping workload, that'd be different of course.

markusthoemmes avatar May 10 '21 07:05 markusthoemmes

So I would need to set aside time to confirm the behavior in the scale-from-zero case (but maybe you might know just based on your understanding of the architecture), but in the "warm" case after a burst (where we scale back down to 1 for 15 minutes), we effectively get a cold start that's potentially even worse than the true cold case.

The situation in that case is that we have 1 pod sitting around available to immediately serve requests.

However, we have containerConcurrency: 1 and requests take ~3 seconds to complete. Therefore, since the panic scaling window only kicks in after 10 seconds, we have most requests (~40 at a time) blocking for the full 10 seconds + pod spin up time.

Since the panic window is globally configured, I don't want to change its value. Therefore it seems like a reasonable solution to this problem is to keep the floor # of pods at some value > 1.

danieltahara avatar May 10 '21 16:05 danieltahara

Hmm, that sounds kind of odd though. The panic window is 6s by default, but that doesn't mean that scaling should only happen after 6s but that the amount of historic data the decision is based on is at most 6s old. As such, especially with an aggressive containerConcurrency setting, I would expect the workload to scale almost instantaneously.

It'd be awesome to get a reproducer for this behavior if possible, if we have indeed a bug here. Could you share the Knative Service YAML that you're using here? Any other settings tweaked?

markusthoemmes avatar May 10 '21 16:05 markusthoemmes

The panic window is 6s by default, but that doesn't mean that scaling should only happen after 6s but that the amount of historic data the decision is based on is at most 6s old.

Ah yep. 10% != 10s. And also duh, re: how the windowing works.

I doubt there's a "bug" here so much as a workload-specific thing (pattern, large docker image that doesn't seem to cache well). I definitely owe a repro if this is going to turn into a full-fledged feature request, but I did want to see if folks had encountered this or you could reason through the behavior offhand (or if there was a configuration variable I was unaware of).

(Also we're on 0.18, but afaict nothing between 0.18 and tip changes this behavior).

danieltahara avatar May 10 '21 16:05 danieltahara

Yeah if there's anyway to repo with a script where you simulate your traffic bursts that'd be helpful

dprotaso avatar Jun 03 '21 22:06 dprotaso

+1 for the feature. We are using knative for our production workload @gojek. However, we don't use scale-to-zero feature since we want to guarantee that at least more than 1 replica of an application is available when it is serving traffic.

pradithya avatar Jun 09 '21 05:06 pradithya

I think @danieltahara 's case was for avoiding situations between 0 and (for example) 10. If you want to ensure that there minimum 10 replicas, I think the autoscaling.knative.dev/minScale annotation should work for you.

With respect to the original bug report, it would be good to get a repro case and see how much scaling delay we're getting in that scenario. Maybe a simple repro case would be:

  • Container of any size that takes 10s to process a request (e.g. see the autoscale test image with ?sleep=10s parameter)
  • ContainerConcurrency=1
  • Issue 100 simultaneous requests, count time from first and last request issued (to rule out delay on client) until all requests served(?)

Our expectation would be that the duration of the test would be <20s with 100 Pods spawned, correct?

evankanderson avatar Jun 23 '21 22:06 evankanderson

Yes I'd expect us to hit 100 pods requested (deployed is another story :sweat_smile: ) almost immediately.

markusthoemmes avatar Jun 24 '21 07:06 markusthoemmes

avoiding situations between 0 and (for example) 10

@evankanderson That's exactly what I think the ideal state too. So if the service is not serving traffic, the number of replicas should be 0. But if there is traffic it has to be at least x, where x can be configurable similar to autoscaling.knative.dev/minScale. This avoid scenario where my service has only 1 replica and could cause disruption if this replica become nonoperational due to for example getting rescheduled by K8S.

I do use autoscaling.knative.dev/minScale currently, but the scale to zero will not be triggered even under no traffic scenario.

pradithya avatar Jun 24 '21 08:06 pradithya

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Sep 23 '21 01:09 github-actions[bot]

Adding an extra use case for this that I've seen a couple times and recently in this slack thread: there are cases - especially bursty eventing workloads - where at a low number of instances we struggle to handle the load at all causing autoscaler to react by dramatically over-shooting the needed number of pods. In this case once we've scaled up way too far we notice concurrency is now much too high, and /under-shoot/ back down to a number of pods that can't handle the load at all. This causes concurrency to shoot up (because requests are backing up) and we react by dramatically over-scaling again. Rinse, repeat. (Another example)

We currently don't have a great way of dealing with these use cases: scale-down-delay helps a bit, but eventually still wears off, and similarly tweaking the max-scale-down rate helps a bit but still ends the same way. Having an explicit "if there's any load give me at least N instances to handle it" is maybe not ideal (it'd be nicer if autoscaler was more magic, obviously, but short of something more stateful it's difficult to see how it can be), but would address these use cases that we currently struggle with in a relatively low-hanging-fruit way.

I think this is worth thinking about, and I might even take a crack at it if no-one hates the idea.

/reopen /remove-lifecycle stale

julz avatar Nov 09 '21 10:11 julz

@julz: Reopened this issue.

In response to this:

Adding an extra use case for this that I've seen a couple times and recently in this slack thread: there are cases - especially bursty eventing workloads - where at a low number of instances we struggle to handle the load at all causing autoscaler to react by dramatically over-shooting the needed number of pods. In this case once we've scaled up way too far we notice concurrency is now much too high, and /under-shoot/ back down to a number of pods that can't handle the load at all. This causes concurrency to shoot up (because requests are backing up) and we react by dramatically over-scaling again. Rinse, repeat. (Another example)

We currently don't have a great way of dealing with these use cases: scale-down-delay helps a bit, but eventually still wears off, and similarly tweaking the max-scale-down rate helps a bit but still ends the same way. Having an explicit "if there's any load give me at least N instances to handle it" is maybe not ideal (it'd be nicer if autoscaler was more magic, obviously, but short of something more stateful it's difficult to see how it can be), but would address these use cases that we currently struggle with in a relatively low-hanging-fruit way.

I think this is worth thinking about, and I might even take a crack at it if no-one hates the idea.

/reopen /remove-lifecycle stale

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot avatar Nov 09 '21 10:11 knative-prow-robot

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Feb 08 '22 01:02 github-actions[bot]

/reopen

psschwei avatar Mar 24 '22 11:03 psschwei

@psschwei: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot avatar Mar 24 '22 11:03 knative-prow-robot

/remove-lifecycle stale

psschwei avatar Mar 24 '22 11:03 psschwei

I'm going to poke around with this some /assign

psschwei avatar Mar 24 '22 11:03 psschwei

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Jun 23 '22 01:06 github-actions[bot]

/lifecycle frozen

psschwei avatar Jun 23 '22 02:06 psschwei

@psschwei any news on this one?

skonto avatar Jul 13 '22 12:07 skonto

I got pulled into some other things and haven't had a chance to look into this yet... still on my list, but if you (or anyone else) wants to take it I don't mind letting it go...

psschwei avatar Jul 13 '22 13:07 psschwei

It is fine just came up in internal discussions. If I come up with something will ping you :)

skonto avatar Jul 13 '22 13:07 skonto

Leaving this open since we want to add an e2e test

/reopen

dprotaso avatar Jul 27 '22 21:07 dprotaso

Going to close this out - e2e PR is open

dprotaso avatar Nov 09 '22 16:11 dprotaso