In addition, there seems to be no backfilling during backfilling
When I try to backfill, I first run a 4C vcjob1 in a total of 8C environment. At this time, I see that there are 3C resources on the node, and then run the same 4C vcjob2. Vcjob2 is in pending state, and then run a 1c vcjob3. Vcjob3 is also in pending state. I understand that with backfilling, small jobs should be first to improve resource utilization.
The scheduler configmap is configured as: volcano-scheduler.conf: | actions: "enqueue, allocate, reclaim, backfill" tiers: - plugins: - name: priority - name: gang - name: conformance - plugins: - name: overcommit - name: drf - name: predicates - name: proportion - name: nodeorder - name: binpack kind: ConfigMap
volcano:1.6.0 K8s version: 1.24/1.21
Can you give a configmap file for correct test backfill, or specify a correct scenario test class, thank you!
@shmily0000 Hey! Please mind the following items if backfill is configured in your environment.
- Currently,
backfillonly takes effects for BestEffort Pod. Workloads with specifed resource request will not be taken into consideration. - The scenario and reproduced steps is a little bit rough. Perhaps a clear description is more helpful to look into the problem. For example, you can list the capacitiy/used resource/existing workloads of all nodes in the cluster. Besides, other resouce condition expect CPU should be provided to help give a full scope.
@Thor-wl OK, this test is conducted on my own local single node. I will list my configuration below.
~First, this is the resource usage when I did not create a vcjob: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits
cpu 950m (11%) 100m (1%) memory 290Mi (10%) 390Mi (13%) ephemeral-storage 100Mi (0%) 0 (0%) hugepages-1Gi 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%)
~Node resources: Capacity: cpu: 8 ephemeral-storage: 18121Mi hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 2895184Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 18121Mi hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 2895184Ki pods: 110
~Yaml of activity 1 apiVersion: batch.volcano.sh/v1alpha1 kind: Job metadata: name: backfill-1 spec: minAvailable: 1 schedulerName: volcano queue: default priorityClassName: system-node-critical policies: - event: PodEvicted action: RestartJob tasks: - replicas: 1 name: nginx policies: - event: TaskCompleted action: CompleteJob template: spec: priorityClassName: system-node-critical containers: - command: - sleep - 10m image: nginx imagePullPolicy: IfNotPresent name: nginx resources: requests: cpu: 4 limits: cpu: 4 restartPolicy: Never
Both job2 and job 1 are 4C, and then job 3 is a 1C small job
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗