scheduler-plugins icon indicating copy to clipboard operation
scheduler-plugins copied to clipboard

Bug [Co-Scheduling]: Pods stuck pending when pods use volumes with a persistent volume claim.

Open roofurmston opened this issue 8 months ago • 3 comments

Area

  • [X] Scheduler
  • [X] Controller
  • [X] Helm Chart
  • [X] Documents

Other components

No response

What happened?

We are using the scheduler-plugins for co-scheduling. In particular, we are using it for gang-scheduling distributed ML training jobs. These jobs run in EKS and use EFS (elastic file system) volumes. These volumes are made dynamically for each workflow and then destroyed when the workflow releases the volume.

Co-scheduling works as expected for pod groups in which (i) the pods make small resource requests, (ii) the pods have no volumes that use the PersistentVolumeClaim field (that references an EFS volume) and (iii) both of these in unison.

However, whenever we try to schedule a pod group with (i) large resource requests and (ii) volumes that use PersistentVolumeClaim field (that references an EFS volume) the pods get stuck in a perpetual Pending state.

What did you expect to happen?

The pod group schedules as normal in this case.

How can we reproduce it (as minimally and precisely as possible)?

We have a minimal reproducible example, which is as follows:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-coscheduling-claim
  namespace: mlplatform-example
spec:
  storageClassName: efs-sc
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3G
---      
apiVersion: scheduling.x-k8s.io/v1alpha1
kind: PodGroup
metadata:
  name: nginx
  namespace: mlplatform-example
spec:
  scheduleTimeoutSeconds: 10
  minMember: 6
---
apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx
  namespace: mlplatform-example
  labels:
    app: nginx
spec:
  replicas: 6
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx
      labels:
        app: nginx
        scheduling.x-k8s.io/pod-group: nginx
    spec:
      schedulerName: scheduler-plugins-scheduler
      containers:
      - name: nginx
        image: nginx
        volumeMounts:
        - mountPath: /mnt/out
          name: out
        - mountPath: /mnt/test
          name: distributed-storage-example
        resources:
          limits:
            cpu: '24'
            memory: 20000M
            ephemeral-storage: 500M
          requests:
            cpu: '24'
            memory: 20000M
            ephemeral-storage: 500M
      volumes:
        - name: out
          emptyDir: {}
        - name: distributed-storage-example
          persistentVolumeClaim:
            claimName: 'test-coscheduling-claim'
            readOnly: false

The pods for this replica set go into a perpetual pending state.

Note that the pods are scheduled successfully in any of the following cases:

  • The pods are scheduled through the default scheduler and not as a pod group.
  • The resource requests for the individual pods are reduced (e.g., to cpu: '1', memory: 200M and ephemeral-storage: 50M
  • The distributed-storage-example volume (and corresponding volume mount) are removed.

Anything else we need to know?

Some logs from the scheduler pod:

W0604 11:10:54.497530       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.498857       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-282b6" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.498910       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.499152       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.500220       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-282b6"
W0604 11:10:54.500350       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.501568       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-c6j6c" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.501618       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.501907       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.502651       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-c6j6c"
W0604 11:10:54.502696       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.503840       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-282b6" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.503873       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.504101       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.504851       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-282b6"
W0604 11:10:54.504917       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.505890       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-c6j6c" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.505931       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.506177       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.506870       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-c6j6c"
W0604 11:10:54.506992       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.507976       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-282b6" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.508012       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.508245       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.508913       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-282b6"
W0604 11:10:54.508959       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.509798       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-c6j6c" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.509822       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.510027       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.510750       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-c6j6c"
W0604 11:10:54.510879       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.511738       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-282b6" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.511763       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.511947       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.512545       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-282b6"
W0604 11:10:54.512740       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.513784       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-c6j6c" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.513817       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.514066       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.514731       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-c6j6c"
W0604 11:10:54.540933       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
I0604 11:10:54.542207       1 coscheduling.go:211] "Pod is waiting to be scheduled to node" pod="mlplatform-example/nginx-282b6" nodeName="ip-172-25-102-62.eu-west-1.compute.internal"
W0604 11:10:54.542252       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
W0604 11:10:54.542544       1 listers.go:63] can not retrieve list of objects using index : Index with name namespace does not exist
E0604 11:10:54.543397       1 schedule_one.go:870] "Error scheduling pod; retrying" err="optimistic rejection in PostFilter" pod="mlplatform-example/nginx-282b6"
E0604 11:10:54.545609       1 event_broadcaster.go:253] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"nginx-c6j6c.17d5c896c861e426", GenerateName:"", Namespace:"mlplatform-example", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, EventTime:time.Date(2024, time.June, 4, 11, 9, 42, 844215184, time.Local), Series:(*v1.EventSeries)(0xc006050900), ReportingController:"scheduler-plugins-scheduler", ReportingInstance:"scheduler-plugins-scheduler-scheduler-plugins-scheduler-749ccc576c-4g9xs", Action:"Scheduling", Reason:"FailedScheduling", Regarding:v1.ObjectReference{Kind:"Pod", Namespace:"mlplatform-example", Name:"nginx-c6j6c", UID:"1bda5d32-f444-4326-9fb6-e25028923260", APIVersion:"v1", ResourceVersion:"464055610", FieldPath:""}, Related:(*v1.ObjectReference)(nil), Note:"optimistic rejection in PostFilter", Type:"Warning", DeprecatedSource:v1.EventSource{Component:"", Host:""}, DeprecatedFirstTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeprecatedLastTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeprecatedCount:0}': 'Event "nginx-c6j6c.17d5c896c861e426" is invalid: series.count: Invalid value: "": should be at least 2' (will not retry!)

Kubernetes version

Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.13-eks-3af4770", GitCommit:"4873544ec1ec7d3713084677caa6cf51f3b1ca6f", GitTreeState:"clean", BuildDate:"2024-04-30T03:31:44Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}

Scheduler Plugins version

We have installed scheduler plugins through ArgoCD. The ArgoCD application is as follows:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata: 
  name: scheduler-plugins
  namespace: argocd
spec: 
  project: default
  source: 
    path: manifests/install/charts/as-a-second-scheduler/
    repoURL: https://github.com/kubernetes-sigs/scheduler-plugins.git
    targetRevision: v0.27.8
    helm: 
      releaseName: scheduler-plugins
      parameters: 
      - name: plugins.enabled
        value: "{Coscheduling}"
      - name: pluginConfig[0].name
        value: Coscheduling"
      - name: pluginConfig[0].args.permitWaitingTimeSeconds
        value: "300"
  destination:
    server: https://kubernetes.default.svc
    namespace: scheduler-plugins
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

roofurmston avatar Jun 04 '24 11:06 roofurmston