descheduler icon indicating copy to clipboard operation
descheduler copied to clipboard

MountVolume.SetUp failed for volume "policy-volume" : object "descheduler"/"descheduler" not registered

Open JackSinclairT opened this issue 3 years ago • 5 comments

What version of descheduler are you using? descheduler version: v0.23.1

Does this issue reproduce with the latest release? Yes

Which descheduler CLI options are you using? schedule: 0 * * * * successfulJobsHistoryLimit:1 failedJobsHistoryLimit:1

Please provide a copy of your descheduler policy config file

cmdOptions: 
  v: 4
deschedulerPolicy: 
  evictLocalStoragePods: true
  strategies:
    LowNodeUtilization:
      enabled: false 
    PodLifeTime: 
      enabled: true
      params: 
        namespaces: 
          include: 
            - thomas
            - thomas-pr
            - thomas-gd
            - thomas-dt
            - health
            - ingress
        podLifeTime: 
          maxPodLifeTimeSeconds: 745200
        podStatusPhases: 
          - Running
    RemoveDuplicates:
      enabled: false
    RemovePodsViolatingInterPodAntiAffinity:
      enabled: false
    RemovePodsViolatingNodeAffinity:
      enabled: false
    RemovePodsViolatingNodeTaints:
      enabled: false

What k8s version are you using (kubectl version)? 1.23.5. We just upgraded this from 1.18.14.

What did you do? We deployed descheduler and let it run on its schedule

What did you expect to see? We expected descheduler to run as normal without errors.

What did you see instead? After we deployed and it ran we received the following errors: MountVolume.SetUp failed for volume "policy-volume" : object "descheduler"/"descheduler" not registered MountVolume.SetUp failed for volume "kube-api-access-xdg9z" : object "descheduler"/"kube-root-ca.crt" not registered

Pod YAML

apiVersion: v1
kind: Pod
metadata:
  name: descheduler-27536220-lhtbl
  generateName: descheduler-27536220-
  namespace: descheduler
  uid: 15aa32c4-a637-462b-8abe-dcf65ebf04ed
  resourceVersion: '13237055'
  creationTimestamp: '2022-05-10T09:00:00Z'
  labels:
    app.kubernetes.io/instance: descheduler
    app.kubernetes.io/name: descheduler
    controller-uid: 17cabd23-8fbd-40ed-b752-a48f4276b73e
    job-name: descheduler-27536220
  annotations:
    checksum/config: 31a6d6bfa0b07252978dc3c5b8305ec9bf971f3b6101a95aad4501f16fde044f
  ownerReferences:
    - apiVersion: batch/v1
      kind: Job
      name: descheduler-27536220
      uid: 17cabd23-8fbd-40ed-b752-a48f4276b73e
      controller: true
      blockOwnerDeletion: true
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2022-05-10T09:00:00Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:checksum/config: {}
          f:generateName: {}
          f:labels:
            .: {}
            f:app.kubernetes.io/instance: {}
            f:app.kubernetes.io/name: {}
            f:controller-uid: {}
            f:job-name: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"17cabd23-8fbd-40ed-b752-a48f4276b73e"}: {}
        f:spec:
          f:containers:
            k:{"name":"descheduler"}:
              .: {}
              f:args: {}
              f:command: {}
              f:image: {}
              f:imagePullPolicy: {}
              f:livenessProbe:
                .: {}
                f:failureThreshold: {}
                f:httpGet:
                  .: {}
                  f:path: {}
                  f:port: {}
                  f:scheme: {}
                f:initialDelaySeconds: {}
                f:periodSeconds: {}
                f:successThreshold: {}
                f:timeoutSeconds: {}
              f:name: {}
              f:resources:
                .: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:securityContext:
                .: {}
                f:allowPrivilegeEscalation: {}
                f:capabilities:
                  .: {}
                  f:drop: {}
                f:privileged: {}
                f:readOnlyRootFilesystem: {}
                f:runAsNonRoot: {}
              f:terminationMessagePath: {}
              f:terminationMessagePolicy: {}
              f:volumeMounts:
                .: {}
                k:{"mountPath":"/policy-dir"}:
                  .: {}
                  f:mountPath: {}
                  f:name: {}
          f:dnsPolicy: {}
          f:enableServiceLinks: {}
          f:priorityClassName: {}
          f:restartPolicy: {}
          f:schedulerName: {}
          f:securityContext: {}
          f:serviceAccount: {}
          f:serviceAccountName: {}
          f:terminationGracePeriodSeconds: {}
          f:volumes:
            .: {}
            k:{"name":"policy-volume"}:
              .: {}
              f:configMap:
                .: {}
                f:defaultMode: {}
                f:name: {}
              f:name: {}
    - manager: Go-http-client
      operation: Update
      apiVersion: v1
      time: '2022-05-10T09:00:02Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions:
            k:{"type":"ContainersReady"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Initialized"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Ready"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:containerStatuses: {}
          f:hostIP: {}
          f:phase: {}
          f:podIP: {}
          f:podIPs:
            .: {}
            k:{"ip":"10.100.5.33"}:
              .: {}
              f:ip: {}
          f:startTime: {}
      subresource: status
  selfLink: /api/v1/namespaces/descheduler/pods/descheduler-27536220-lhtbl
status:
  phase: Succeeded
  conditions:
    - type: Initialized
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2022-05-10T09:00:00Z'
      reason: PodCompleted
    - type: Ready
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2022-05-10T09:00:02Z'
      reason: PodCompleted
    - type: ContainersReady
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2022-05-10T09:00:02Z'
      reason: PodCompleted
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2022-05-10T09:00:00Z'
  hostIP: 10.100.4.155
  podIP: 10.100.5.33
  podIPs:
    - ip: 10.100.5.33
  startTime: '2022-05-10T09:00:00Z'
  containerStatuses:
    - name: descheduler
      state:
        terminated:
          exitCode: 0
          reason: Completed
          startedAt: '2022-05-10T09:00:01Z'
          finishedAt: '2022-05-10T09:00:01Z'
          containerID: >-
            containerd://17848885307d919c850c8538bf6c8865f2bad8e97ee532e6b70a997dcce6a12c
      lastState: {}
      ready: false
      restartCount: 0
      image: k8s.gcr.io/descheduler/descheduler:v0.23.1
      imageID: >-
        k8s.gcr.io/descheduler/descheduler@sha256:a572960a8539e9e44f565c740710b0a527e1af9d267bfaf0e927657b8d75fe91
      containerID: >-
        containerd://17848885307d919c850c8538bf6c8865f2bad8e97ee532e6b70a997dcce6a12c
      started: false
  qosClass: Burstable
spec:
  volumes:
    - name: policy-volume
      configMap:
        name: descheduler
        defaultMode: 420
    - name: kube-api-access-w2fmz
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: descheduler
      image: k8s.gcr.io/descheduler/descheduler:v0.23.1
      command:
        - /bin/descheduler
      args:
        - '--policy-config-file'
        - /policy-dir/policy.yaml
        - '--v'
        - '4'
      resources:
        requests:
          cpu: 500m
          memory: 256Mi
      volumeMounts:
        - name: policy-volume
          mountPath: /policy-dir
        - name: kube-api-access-w2fmz
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      livenessProbe:
        httpGet:
          path: /healthz
          port: 10258
          scheme: HTTPS
        initialDelaySeconds: 3
        timeoutSeconds: 1
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
      securityContext:
        capabilities:
          drop:
            - ALL
        privileged: false
        runAsNonRoot: true
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
  restartPolicy: Never
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: descheduler
  serviceAccount: descheduler
  nodeName: aks-linux1-69935135-vmss000001
  securityContext: {}
  schedulerName: default-scheduler
  tolerations:
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
      tolerationSeconds: 300
    - key: node.kubernetes.io/memory-pressure
      operator: Exists
      effect: NoSchedule
  priorityClassName: system-cluster-critical
  priority: 2000000000
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority

Kubectl events

kubectl get event -n descheduler
LAST SEEN   TYPE      REASON             OBJECT                           MESSAGE
50m         Normal    Scheduled          pod/descheduler-27536160-5zpdw   Successfully assigned descheduler/descheduler-27536160-5zpdw to aks-linux1-69935135-vmss000001
50m         Normal    Pulled             pod/descheduler-27536160-5zpdw   Container image "k8s.gcr.io/descheduler/descheduler:v0.23.1" already present on machine
50m         Normal    Created            pod/descheduler-27536160-5zpdw   Created container descheduler
50m         Normal    Started            pod/descheduler-27536160-5zpdw   Started container descheduler
50m         Warning   FailedMount        pod/descheduler-27536160-5zpdw   MountVolume.SetUp failed for volume "policy-volume" : object "descheduler"/"descheduler" not reg
istered
50m         Warning   FailedMount        pod/descheduler-27536160-5zpdw   MountVolume.SetUp failed for volume "kube-api-access-prhrt" : object "descheduler"/"kube-root-ca
.crt" not registered
50m         Normal    SuccessfulCreate   job/descheduler-27536160         Created pod: descheduler-27536160-5zpdw
50m         Normal    Completed          job/descheduler-27536160         Job completed
50m         Normal    InjectionSkipped   cronjob/descheduler              Linkerd sidecar proxy injection skipped: neither the namespace nor the pod have the annotation "
linkerd.io/inject:enabled"
50m         Normal    SuccessfulCreate   cronjob/descheduler              Created job descheduler-27536160
50m         Normal    SawCompletedJob    cronjob/descheduler              Saw completed job: descheduler-27536160, status: Complete
50m         Normal    SuccessfulDelete   cronjob/descheduler              Deleted job descheduler-27536100

Is it possible this is related to RootCAConfigMap being set to true from Kubernetes 1.22 onwards?

JackSinclairT avatar May 09 '22 16:05 JackSinclairT

Hi @JackSinclairT I haven't seen something like that, but that is a pretty big jump. Were you running descheduler before on k8s 1.18 too?

It might help if you have the descheduler's pod YAML and the full logs where you saw that error (also was that error from pod logs or the kubectl output?)

damemi avatar May 09 '22 16:05 damemi

I haven't seen something like that, but that is a pretty big jump. Were you running descheduler before on k8s 1.18 too?

Yes indeed - worked before on 1.18 and getting these errors now. Others have reported this error here

It might help if you have the descheduler's pod YAML and the full logs where you saw that error (also was that error from pod logs or the kubectl output?)

That output was from kubectl get event. Have updated the above with pod yaml and also the 'kubectl get event -n descheduler' command that shows the errors.

JackSinclairT avatar May 10 '22 09:05 JackSinclairT

@JackSinclairT thanks, and sorry I didn't remember this coming up before but you are right.

I did some quick searches and found this upstream bug about the same issue. Skimming the thread it sounds like this may have been fixed in k8s 1.23.6? (https://github.com/kubernetes/kubernetes/issues/105204#issuecomment-1104744178). I see that you're on 1.23.5, so if you are able to upgrade would you mind reporting if that helps?

Otherwise, it sounds like setting automountServiceAccountToken: false on the pod is the workaround. If you haven't tried that yet, please let us know if it helps. Maybe we should include that as an option in the helm chart deployment for others.

damemi avatar May 12 '22 18:05 damemi

@damemi you're a lifesaver - thanks so much for looking all of that up. I'll try this tomorrow and see if it fixes it!

JackSinclairT avatar May 12 '22 20:05 JackSinclairT

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 10 '22 21:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 09 '22 21:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Oct 09 '22 22:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 09 '22 22:10 k8s-ci-robot