client icon indicating copy to clipboard operation
client copied to clipboard

Refactor autoscaling parameters

Open rhuss opened this issue 5 years ago • 16 comments

Since we have collected all concurrency related options with the option prefix --concurrency- and also list them together in kn service describe, the autoscale window parameter should be also in this group. So either move everything to --concurrency- or, move everything (limit, target, utilization, window) to --autoscale.

I'm not sure in which direction to go, moving to --concurrency is easier as most of the options are already --concurrency prefixed, on the other "concurrency" is quite tedious to type, so maybe 'autoscale-' is a better prefix? But we might even consider a total different prefix as well?

rhuss avatar Apr 21 '20 13:04 rhuss

Just to add another wrinkle into it... not having the same names as what people would put into the yaml has caused me some mental pain :-) it makes it harder to switch between the two worlds. Thinking/remembering is hard.

duglin avatar Apr 21 '20 13:04 duglin

I totally agree that we should align where it is possible, but the concurrency options are a bit 'special', as they are sprinkled all over the YAML (as annotations and different fields). I see the value of the CLI to allow a use to not remember the full annotation.

I think we should use the given name where possible, but deviate where needed. E.g. I really would love to have all concurrency/scaling related parameters together in the 'option namespace' (i.e. appearing together on the help page, so using the same prefix).

rhuss avatar Apr 21 '20 15:04 rhuss

According to Configuring Autoscaling

we should probably better move to --autoscale- prefix and deprecated --concurrency- (so, support it for the next releases, too but move to a uniform prefix). I think that makes sense if we want to support even more autoscale related parameters. The per-revision parameters that possibly make sense to support are:

  • --autoscale-class
  • --autoscale-metric
  • --autoscale-target (deprecate --concurrency-target)
  • --autoscale-limit (deprecate --concurrency-limit)
  • --autoscale-utilization (deprecate --concurrency-utilization)
  • --autoscale-rps (requests-per-second) <--- I think we can skip that as it's not clear how this differs to --autoscale-target
  • --autoscale-min (deprecate --min-scale. Does this make sense ?)
  • --autoscale-max (deprecate --max-scale. Does this make sense ?)
  • --autoscale-window
  • --autoscale-panic-window
  • --autoscale-panic-threshold

For backwards compatibility reason we could stick to the current main concurrency related config:

  • --concurrency-limit
  • --concurrency-target
  • --min-scale
  • --max-scale

then deprecate --autoscale-window and --concurrency-utilization and introduce a --autoscale parameter for configuring less used autoscaling parameters like in --autoscale window=10s,utilization=70,panic-window=10 with the alternative syntax --autoscale window=10s --autoscale utilization=70 --autoscale panic-window=10 (the same what we already support for array-typed options like --env or --annotation). For sake of consistency we then can add also min, max, limit and target to the key --autoscale subconfig keys.

The benefit of this approach:

  • We don't have to deprecate too many options, resulting in a better backwards-compatible approach. Only --concurrency-utilization and --autoscale-window would be affected.
  • We can easily add new autoscaling parameters without polluting the option namespace
  • Shortcuts for the most common options, but allowing all possible autoscaling parameters.

I think, I like this suggestion. wdyt ?

rhuss avatar Apr 28 '20 07:04 rhuss

I'd hold off until they're done with that doc and resulting PRs. I'm still hoping they'll rename stuff as a result of that work.

In particular, --autoscale-limit is not a great alternative to container-concurrency. As a user I would guess that --autoscale-limit is about max scaling of instances, not the amount of concurrency allowed per instance

duglin avatar Apr 28 '20 11:04 duglin

According to Configuring Autoscaling

@rhuss can I please get access to this doc ?

itsmurugappan avatar Apr 28 '20 13:04 itsmurugappan

We sorted out the issue. FYI you need to subscribe to https://groups.google.com/forum/#!forum/knative-dev in order to access Knative documentations ....

rhuss avatar Apr 28 '20 14:04 rhuss

FWIW: I would vote against --autoscale-limit definitely, it's not even really an autoscaling parameter. It's more of a loadbalancing/enforcement paramter for the revision which is just inferred into target by the autoscaler, because that makes sense from its PoV. I'd not prepend that with autoscale.

markusthoemmes avatar Apr 30 '20 08:04 markusthoemmes

FWIW: I would vote against --autoscale-limit definitely, it's not even really an autoscaling parameter. It's more of a loadbalancing/enforcement paramter for the revision which is just inferred into target by the autoscaler, because that makes sense from its PoV. I'd not prepend that with autoscale.

@markusthoemmes so you are saying that the "limit" and the "target" are not belonging to the same group (like kind of imposed by the syntax "autoscaling.knative.dev/target"), where on is connected to autoscaling and the other not ? If so, then I think the analogy to 'soft' vs 'hard' limit is a bit misleading then, too (limit of what ?)

So its should be --concurrency-limit and --autoscale-target ? (or would be --concurrency-target also ok ?)

rhuss avatar Apr 30 '20 14:04 rhuss

Yeah, it's a little involved.

The autoscaler only knows about the target. If no explicit target is set, it is derived from the concurrency limit (aka containerConcurrency). I agree, the analogy might be a bit weird, it explains the behavior from the user's pov though.

markusthoemmes avatar Apr 30 '20 14:04 markusthoemmes

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Oct 15 '20 01:10 github-actions[bot]

/remove-lifecycle-stale /reopen

We definitely still need on this refactoring. E.g we have --autoscale-window but --scale-min. Technically it should be probably all --autoscale but --scale is shorter.

Also we are still missing some parameters, like the configuration options for the panic window.

rhuss avatar Dec 01 '20 10:12 rhuss

@rhuss: Reopened this issue.

In response to this:

/remove-lifecycle-stale /reopen

We definitely still need on this refactoring. E.g we have --autoscale-window but --scale-min. Technically it should be probably all --autoscale but --scale is shorter.

Also we are still missing some parameters, like the configuration options for the panic window.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

knative-prow-robot avatar Dec 01 '20 10:12 knative-prow-robot

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

github-actions[bot] avatar Aug 16 '21 01:08 github-actions[bot]

/remove-lifecycle stale

rhuss avatar Sep 01 '21 09:09 rhuss

@rhuss any timeline for plans to work on this one? I think refactoring these would give a much nicer UX.

abrennan89 avatar Nov 15 '21 16:11 abrennan89

:thinking: it seems to be done already.

                                          generation, and {{.Random [n]}} for n random consonants (e.g. {{.Service}}-{{.Random 5}}-{{.Generation}})
      --scale string                      Set the Minimum and Maximum number of replicas. You can use this flag to set both to a single value, or set a range with min/max values, or set either min or max values without specifying the other. Example: --scale 5 (scale-min = 5, scale-max = 5) or --scale
                                          1..5 (scale-min = 1, scale-max = 5) or --scale 1.. (scale-min = 1, scale-max = unchanged) or --scale ..5 (scale-min = unchanged, scale-max = 5)
      --scale-init int                    Initial number of replicas with which a service starts. Can be 0 or a positive integer.
      --scale-max int                     Maximum number of replicas.
      --scale-min int                     Minimum number of replicas.
      --scale-target int                  Recommendation for when to scale up based on the concurrent number of incoming request. Defaults to --concurrency-limit when given.
      --scale-utilization int             Percentage of concurrent requests utilization before scaling up. (default 70)
      --scale-window string               Duration to look back for making auto-scaling decisions. The service is scaled to zero if no request was received in during that time. (eg: 10s)

There's a PR to add --scale-metric, but that's current state of scaling flags. I can look up since which release it's changed, if it helps.

dsimansk avatar Apr 14 '22 14:04 dsimansk