autoscaler targetRef field misbehaviour

Which component are you using?:

VPA in GKE

What version of the component are you using?:

Component version: GKE v1.21.10-gke.2000

What k8s version are you using (kubectl version)?:

kubectl version Output

Client Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-dispatcher", GitCommit:"2a8027f41d28b788b001389f3091c245cd0a9a60", GitTreeState:"clean", BuildDate:"2022-01-21T20:31:13Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.10-gke.2000", GitCommit:"0823380786b063c3f71d5e7c76826a972e30550d", GitTreeState:"clean", BuildDate:"2022-03-17T09:22:22Z", GoVersion:"go1.16.14b7", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

GKE

What did you expect to happen?:

I have two stateful sets and one vpa for each:

vpa0

apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: sts0 labels: app: sts spec: targetRef: apiVersion: "apps/v1" kind: StatefulSet name: sts0 updatePolicy: updateMode: "Auto" minReplicas: 1 resourcePolicy: containerPolicies: - containerName: nginx mode: "Auto"

sts0

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sts0
  labels:
    app: sts
spec:
  selector:
    matchLabels:
      app: sts
  replicas: 1
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: sts
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: logs
          mountPath: /root/logs
        - name: www
          mountPath: /usr/share/nginx/html
        resources:
          requests:
            cpu: "1"
            memory: 128Mi
          limits:
            cpu: "3"
            memory: 128Mi
      volumes:
        - name: logs
          emptyDir: {}
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: resizable-balanced
      resources:
        requests:
          storage: 512Mi

vpa1

apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: sts1 labels: app: sts spec: targetRef: apiVersion: "apps/v1" kind: StatefulSet name: sts1 updatePolicy: updateMode: "Auto" minReplicas: 1 resourcePolicy: containerPolicies: - containerName: nginx mode: "Auto"

sts1

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: sts1
  labels:
    app: sts
spec:
  selector:
    matchLabels:
      app: sts
  replicas: 1
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: sts
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: logs
          mountPath: /root/logs
        - name: www
          mountPath: /usr/share/nginx/html
        resources:
          requests:
            cpu: "1"
            memory: 128Mi
          limits:
            cpu: "3"
            memory: 128Mi
      volumes:
        - name: logs
          emptyDir: {}
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: resizable-balanced
      resources:
        requests:
          storage: 512Mi

I was expectingvpa0 to apply resources recommendations to sts0, and vpa1 to sts1

What happened instead?:

But I have noticed both sts0 and sts1 were being updated by vpa1, and the vpa0 was not applying any recommendations to sts0

I could make it work by adding selector labels that differentiate one stateful set from another:

sts0

spec:
  selector:
    matchLabels:
      app: sts
      shard_id: "0"

sts1

spec:
  selector:
    matchLabels:
      app: sts
      shard_id: "1"

I think this is why https://github.com/kubernetes/autoscaler/blob/e661d61c0861ed61fa89242a45a8dc4b0caca395/vertical-pod-autoscaler/pkg/target/fetcher.go#L162

May 16 '22 17:05 rafaeljesus

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 15 '22 11:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Sep 14 '22 12:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Oct 14 '22 13:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 14 '22 13:10 k8s-ci-robot