client Refactor autoscaling parameters

Since we have collected all concurrency related options with the option prefix --concurrency- and also list them together in kn service describe, the autoscale window parameter should be also in this group. So either move everything to --concurrency- or, move everything (limit, target, utilization, window) to --autoscale.

I'm not sure in which direction to go, moving to --concurrency is easier as most of the options are already --concurrency prefixed, on the other "concurrency" is quite tedious to type, so maybe 'autoscale-' is a better prefix? But we might even consider a total different prefix as well?

Apr 21 '20 13:04 rhuss

Just to add another wrinkle into it... not having the same names as what people would put into the yaml has caused me some mental pain :-) it makes it harder to switch between the two worlds. Thinking/remembering is hard.

Apr 21 '20 13:04 duglin

I totally agree that we should align where it is possible, but the concurrency options are a bit 'special', as they are sprinkled all over the YAML (as annotations and different fields). I see the value of the CLI to allow a use to not remember the full annotation.

I think we should use the given name where possible, but deviate where needed. E.g. I really would love to have all concurrency/scaling related parameters together in the 'option namespace' (i.e. appearing together on the help page, so using the same prefix).

Apr 21 '20 15:04 rhuss

According to Configuring Autoscaling

we should probably better move to --autoscale- prefix and deprecated --concurrency- (so, support it for the next releases, too but move to a uniform prefix). I think that makes sense if we want to support even more autoscale related parameters. The per-revision parameters that possibly make sense to support are:

--autoscale-class
--autoscale-metric
--autoscale-target (deprecate --concurrency-target)
--autoscale-limit (deprecate --concurrency-limit)
--autoscale-utilization (deprecate --concurrency-utilization)
--autoscale-rps (requests-per-second) <--- I think we can skip that as it's not clear how this differs to --autoscale-target
--autoscale-min (deprecate --min-scale. Does this make sense ?)
--autoscale-max (deprecate --max-scale. Does this make sense ?)
--autoscale-window
--autoscale-panic-window
--autoscale-panic-threshold

For backwards compatibility reason we could stick to the current main concurrency related config:

--concurrency-limit
--concurrency-target
--min-scale
--max-scale

then deprecate --autoscale-window and --concurrency-utilization and introduce a --autoscale parameter for configuring less used autoscaling parameters like in --autoscale window=10s,utilization=70,panic-window=10 with the alternative syntax --autoscale window=10s --autoscale utilization=70 --autoscale panic-window=10 (the same what we already support for array-typed options like --env or --annotation). For sake of consistency we then can add also min, max, limit and target to the key --autoscale subconfig keys.

The benefit of this approach:

We don't have to deprecate too many options, resulting in a better backwards-compatible approach. Only --concurrency-utilization and --autoscale-window would be affected.
We can easily add new autoscaling parameters without polluting the option namespace
Shortcuts for the most common options, but allowing all possible autoscaling parameters.

I think, I like this suggestion. wdyt ?

Apr 28 '20 07:04 rhuss

I'd hold off until they're done with that doc and resulting PRs. I'm still hoping they'll rename stuff as a result of that work.

In particular, --autoscale-limit is not a great alternative to container-concurrency. As a user I would guess that --autoscale-limit is about max scaling of instances, not the amount of concurrency allowed per instance

Apr 28 '20 11:04 duglin

According to Configuring Autoscaling

@rhuss can I please get access to this doc ?

Apr 28 '20 13:04 itsmurugappan

We sorted out the issue. FYI you need to subscribe to https://groups.google.com/forum/#!forum/knative-dev in order to access Knative documentations ....

Apr 28 '20 14:04 rhuss

FWIW: I would vote against --autoscale-limit definitely, it's not even really an autoscaling parameter. It's more of a loadbalancing/enforcement paramter for the revision which is just inferred into target by the autoscaler, because that makes sense from its PoV. I'd not prepend that with autoscale.

Apr 30 '20 08:04 markusthoemmes

FWIW: I would vote against --autoscale-limit definitely, it's not even really an autoscaling parameter. It's more of a loadbalancing/enforcement paramter for the revision which is just inferred into target by the autoscaler, because that makes sense from its PoV. I'd not prepend that with autoscale.

@markusthoemmes so you are saying that the "limit" and the "target" are not belonging to the same group (like kind of imposed by the syntax "autoscaling.knative.dev/target"), where on is connected to autoscaling and the other not ? If so, then I think the analogy to 'soft' vs 'hard' limit is a bit misleading then, too (limit of what ?)

So its should be --concurrency-limit and --autoscale-target ? (or would be --concurrency-target also ok ?)

Apr 30 '20 14:04 rhuss

Yeah, it's a little involved.

The autoscaler only knows about the target. If no explicit target is set, it is derived from the concurrency limit (aka containerConcurrency). I agree, the analogy might be a bit weird, it explains the behavior from the user's pov though.

Apr 30 '20 14:04 markusthoemmes

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Oct 15 '20 01:10 github-actions[bot]

/remove-lifecycle-stale /reopen

We definitely still need on this refactoring. E.g we have --autoscale-window but --scale-min. Technically it should be probably all --autoscale but --scale is shorter.

Also we are still missing some parameters, like the configuration options for the panic window.

Dec 01 '20 10:12 rhuss

@rhuss: Reopened this issue.

In response to this:

/remove-lifecycle-stale /reopen

We definitely still need on this refactoring. E.g we have --autoscale-window but --scale-min. Technically it should be probably all --autoscale but --scale is shorter.

Also we are still missing some parameters, like the configuration options for the panic window.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 01 '20 10:12 knative-prow-robot

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

Aug 16 '21 01:08 github-actions[bot]

/remove-lifecycle stale

Sep 01 '21 09:09 rhuss

@rhuss any timeline for plans to work on this one? I think refactoring these would give a much nicer UX.

Nov 15 '21 16:11 abrennan89

:thinking: it seems to be done already.

                                          generation, and {{.Random [n]}} for n random consonants (e.g. {{.Service}}-{{.Random 5}}-{{.Generation}})
      --scale string                      Set the Minimum and Maximum number of replicas. You can use this flag to set both to a single value, or set a range with min/max values, or set either min or max values without specifying the other. Example: --scale 5 (scale-min = 5, scale-max = 5) or --scale
                                          1..5 (scale-min = 1, scale-max = 5) or --scale 1.. (scale-min = 1, scale-max = unchanged) or --scale ..5 (scale-min = unchanged, scale-max = 5)
      --scale-init int                    Initial number of replicas with which a service starts. Can be 0 or a positive integer.
      --scale-max int                     Maximum number of replicas.
      --scale-min int                     Minimum number of replicas.
      --scale-target int                  Recommendation for when to scale up based on the concurrent number of incoming request. Defaults to --concurrency-limit when given.
      --scale-utilization int             Percentage of concurrent requests utilization before scaling up. (default 70)
      --scale-window string               Duration to look back for making auto-scaling decisions. The service is scaled to zero if no request was received in during that time. (eg: 10s)

There's a PR to add --scale-metric, but that's current state of scaling flags. I can look up since which release it's changed, if it helps.

Apr 14 '22 14:04 dsimansk