volcano icon indicating copy to clipboard operation
volcano copied to clipboard

pods in queue's resource capacity will case share value to 1

Open lowang-bh opened this issue 1 year ago • 10 comments

As the picture described, a queue has allocated pods will cause queue's share to be 1 always, although each of other resource has not used up.

What happened: image image

What you expected to happen:

share value should less than 1

How to reproduce it (as minimally and precisely as possible):

create a job in one queue, example yaml

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: queue-a
spec:
  capability:
    cpu: "4"
    memory: "2Gi"
---
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-high
spec:
  schedulerName: volcano
  priorityClassName: high-priority
  queue: queue-a
  tasks:
    - replicas: 1
      name: "default-nginx"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx:1.14.2
              imagePullPolicy: IfNotPresent
              name: nginx
              resources:
                requests:
                  cpu: "3"
                  memory: "1Gi"
          restartPolicy: OnFailure

Anything else we need to know?:

Environment:

  • Volcano Version: master
  • Kubernetes version (use kubectl version): 1.27
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
apiVersion: v1
data:
  volcano-scheduler.conf: |
    actions: "allocate, preempt, backfill"
    tiers:
    - plugins:
      - name: priority
      - name: gang
      - name: conformance
    - plugins:
      - name: overcommit
      - name: drf
        enablePreemptable: false
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack
kind: ConfigMap
metadata:
  name: volcano-scheduler-configmap
  namespace: volcano-system

lowang-bh avatar Dec 08 '23 05:12 lowang-bh

Yep, that's a problem. It's caused by https://github.com/volcano-sh/volcano/pull/3188/files#diff-b361e185e6595bd52bd084dae3e830736c2ed7fc17fad7aef218d2a2527a02c9, and we can fix it refer to pr https://github.com/volcano-sh/volcano/pull/3223

Monokaix avatar Dec 08 '23 06:12 Monokaix

Hi @Monokaix @lowang-bh .I read through the mentioned PRs and comments and have started working on the issue. Just a few questions I have regarding the issue , in this attached screenshot , the 1 highlighted I dont seem to quite understand where and how the 1 is used and why it is required? Thank you

jignyasamishra avatar Dec 16 '23 15:12 jignyasamishra

where and how the 1 is used and why it is required?

Hi, @jignyasamishra thanks for your passion. In DRF and proportin plugin, the share value is used to sort queue/jobs/tasks in heap-sort.

https://github.com/volcano-sh/volcano/blob/7e23a7cfa70502e1519e9ceda274b3053a529ab6/pkg/scheduler/plugins/proportion/proportion.go#L391

https://github.com/volcano-sh/volcano/blob/7e23a7cfa70502e1519e9ceda274b3053a529ab6/pkg/scheduler/plugins/drf/drf.go#L502

https://github.com/volcano-sh/volcano/blob/7e23a7cfa70502e1519e9ceda274b3053a529ab6/pkg/scheduler/api/helpers/helpers.go#L76

lowang-bh avatar Dec 17 '23 07:12 lowang-bh

Sorry for not digging so deep before, according to queue's deserved resources computation logic here, every dimension resource will not be greater than its request resource, so share value 1 is correct here.

Monokaix avatar Dec 21 '23 11:12 Monokaix

value 1 is correct here.

That is not acceptable and has broken the Drf rules. If we need consider pods number in resource, then we need to have a capacity/guarantee of pods number respectively.

lowang-bh avatar Dec 21 '23 12:12 lowang-bh

value 1 is correct here.

That is not acceptable and has broken the Drf rules. If we need consider pods number in resource, then we need to have a capacity/guarantee of pods number respectively.

I think this can be resolved after https://github.com/volcano-sh/volcano/pull/3216 merged.

Monokaix avatar Jan 08 '24 03:01 Monokaix

I think this can be resolved after #3216 merged.

It is same as the merged one #3219

lowang-bh avatar Jan 09 '24 02:01 lowang-bh

I think this can be resolved after #3216 merged.

It is same as the merged one #3219

Yeah,it's a cherry-pick, we should also merge it to master branch.

Monokaix avatar Jan 09 '24 03:01 Monokaix

/good-first-issue

lowang-bh avatar May 01 '24 13:05 lowang-bh

@lowang-bh: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

volcano-sh-bot avatar May 01 '24 13:05 volcano-sh-bot