volcano add support for volumeBinding plugin

This submission is based on a previous PR(#2506), and fixes the issue where the volumeBinding plugin did not correctly filter nodes when performing predicates on multiple tasks of a job.

Jul 13 '23 12:07 chxk

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign qiankunli You can assign the PR to them by writing /assign @qiankunli in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jul 13 '23 12:07 volcano-sh-bot

/assign @qiankunli @k82cn @william-wang

Jul 13 '23 12:07 chxk

@chxk Thanks for the pr :) Would you log a issue firstly, and add provide the following information？ What happened: What you expected to happen: How to reproduce it (as minimally and precisely as possible):

Jul 14 '23 08:07 william-wang

@chxk Thanks for the pr :) Would you log a issue firstly, and add provide the following information？ What happened: What you expected to happen: How to reproduce it (as minimally and precisely as possible):

ok @william-wang What happened: Currently Volcano can't filter nodes by PV, which can't meet our requirements. We found a PR(#2506) which added this feature support (based on the volumeBinding plugin) before, but it was reverted. After investigation, we found the problem introduced by this PR: When allocating nodes for each task in a job, it uses the ssn.cache.AllocateVolumes, which based on an already initialized SchedulerVolumeBinder, and will update its pvCache and pvcCache. However, every time a task is predicated, a new volumeBinding plugin is initialized, and the empty pvCache and pvcCache are used for Filter, instead of which in the initialized SchedulerVolumeBinder. This can potentially cause a PV to be bound to multiple different PVCs.

What you expected to happen: Add volumeBinding feature to Volcano and it should work normally. After our changes were merged and deployed, we have not observed the memory leak issue that was previously mentioned in #2555。

How to reproduce it (as minimally and precisely as possible): based on a version with PR(#2506):

Create a local PV with storageClass fast-disks.
Create a job with two pods, each requesting a different PVC(storageClass is both fast-disks) and using a nodeSelector pointing to the node of the local PV. The yaml file is as following

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: test-binding
  namespace: default
spec:
  minMember: 2
  queue: default
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-binding-1
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: fast-disks
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-binding-2
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: fast-disks
---
apiVersion: v1
kind: Pod
metadata:
  name: test-binding-1
  namespace: default
  annotations:
    scheduling.k8s.io/group-name: test-binding
spec:
  schedulerName: volcano
  nodeSelector:
    testpvc: owner
  containers:
  - name: test
    image: <your image>
    resources:
      limits:
        cpu: 3
    command:
      - /bin/bash
      - -c
      - sleep 1d
    volumeMounts:
    - name: hahaha
      mountPath: /fast-disks
  volumes:
  - name: hahaha
    persistentVolumeClaim:
      claimName: test-binding-1
---
apiVersion: v1
kind: Pod
metadata:
  name: test-binding-2
  namespace: default
  annotations:
    scheduling.k8s.io/group-name: test-binding
spec:
  schedulerName: volcano
  nodeSelector:
    testpvc: owner
  containers:
  - name: test
    image: <your image>
    resources:
      limits:
        cpu: 3
    command:
      - /bin/bash
      - -c
      - sleep 1d
    volumeMounts:
    - name: hahaha
      mountPath: /fast-disks
  volumes:
  - name: hahaha
    persistentVolumeClaim:
      claimName: test-binding-2

Observe the result: the job gets scheduled, but only one pod's PVC gets bound to the local PV.

Jul 16 '23 16:07 chxk

lgtm

Jul 25 '23 06:07 hwdef

I don't know much about this area. I think the code is ok.

Thank you for review. Could you help review this pr please? @Thor-wl @william-wang

Aug 01 '23 15:08 chxk

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Oct 15 '23 09:10 stale[bot]

/reopen

Dec 22 '23 14:12 lowang-bh

@lowang-bh: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Dec 22 '23 14:12 volcano-sh-bot

i met same problem, and i think this PR can solve it greatly

Feb 27 '24 08:02 bibibox

lgtm

Feb 27 '24 08:02 bibibox

@chxk: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Apr 09 '24 02:04 volcano-sh-bot