serving Allow user to change global metrics of autoscaling in ConfigMap: config-autoscaler

Describe the feature

Allow users to change global metrics of autoscaling in ConfigMap: config-autoscaler.

It seems that the global metric of autoscaling is concurrency, because global configurations about concurrency in ConfigMap: config-autoscaler work, e.g., container-concurrency-target-default and container-concurrency-target-percentage.

apiVersion: v1
kind: ConfigMap
metadata:
  name: config-autoscaler
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "v0.22.1"
data:
  allow-zero-initial-scale: "false"
  container-concurrency-target-default: "100"
  container-concurrency-target-percentage: "0.7"

And the configuration about rps, requests-per-second-target-default, doesn't work unless autoscaling.knative.dev/metric: "rps" is configured in the InferenceService.

apiVersion: v1
kind: ConfigMap
metadata:
  name: config-autoscaler
  namespace: knative-serving
  labels:
    serving.knative.dev/release: "v0.22.1"
data:
  allow-zero-initial-scale: "false"
  requests-per-second-target-default: "100"

apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
  annotations:
    "sidecar.istio.io/inject": "false"
    # RPS
    autoscaling.knative.dev/metric: "rps"
    # autoscaling.knative.dev/target: "2"
  name: autoscaler-test
  namespace: test
spec:
  predictor:
    canaryTrafficPercent: 100
    serviceAccountName: sa
    tensorflow:
      image: tensorflow/serving:2.4.0
      name: kfserving-container
      runtimeVersion: 2.4.0
      storageUri: s3://tfx/models

If we can set metrics in the ConfigMap for autoscaling. E.g., rps, we don't need to config it every time when creating InferenceServices. It could be a useful feature.

Sep 02 '22 03:09 wyljpn

@psschwei

Sep 02 '22 03:09 wyljpn

Off the top of my head, I don't see any issues allowing autoscaling.knative.dev/metric to be set globally rather on a per-revision basis (I assume a global rps with some revisions using concurrency would be handled the same as is currently done with the global default of concurrency and per-revision rps though if that proves not to be the case we'd need to revisit).

/triage accepted

Sep 08 '22 13:09 psschwei