Allow user to change global metrics of autoscaling in ConfigMap: config-autoscaler
Describe the feature
Allow users to change global metrics of autoscaling in ConfigMap: config-autoscaler.
It seems that the global metric of autoscaling is concurrency, because global configurations about concurrency in ConfigMap: config-autoscaler work, e.g., container-concurrency-target-default and container-concurrency-target-percentage.
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
labels:
serving.knative.dev/release: "v0.22.1"
data:
allow-zero-initial-scale: "false"
container-concurrency-target-default: "100"
container-concurrency-target-percentage: "0.7"
And the configuration about rps, requests-per-second-target-default, doesn't work unless autoscaling.knative.dev/metric: "rps" is configured in the InferenceService.
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
labels:
serving.knative.dev/release: "v0.22.1"
data:
allow-zero-initial-scale: "false"
requests-per-second-target-default: "100"
apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
metadata:
annotations:
"sidecar.istio.io/inject": "false"
# RPS
autoscaling.knative.dev/metric: "rps"
# autoscaling.knative.dev/target: "2"
name: autoscaler-test
namespace: test
spec:
predictor:
canaryTrafficPercent: 100
serviceAccountName: sa
tensorflow:
image: tensorflow/serving:2.4.0
name: kfserving-container
runtimeVersion: 2.4.0
storageUri: s3://tfx/models
If we can set metrics in the ConfigMap for autoscaling. E.g., rps, we don't need to config it every time when creating InferenceServices. It could be a useful feature.
@psschwei
Off the top of my head, I don't see any issues allowing autoscaling.knative.dev/metric to be set globally rather on a per-revision basis (I assume a global rps with some revisions using concurrency would be handled the same as is currently done with the global default of concurrency and per-revision rps though if that proves not to be the case we'd need to revisit).
/triage accepted