volcano
volcano copied to clipboard
put back the queue to priority queue after job's resource allocating …
Resolves #3407
Put back the queue to priority queue after job's resource allocating finished to ensure that the priority of the queue is calculated based on the latest resource allocation situation.
/retest
@panoswoo: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.
In response to this:
/retest
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi, please add some ut to cover it. Thanks
Hi, please add some ut to cover it. Thanks
Done, but i haven't found a simple way to check the allocate order of tasks, so I made some extensions to TestCommonStruct to enable it to record the actual allocation order. This change is compatible with previous cases. Looking forward to your suggestions
Hi, please add some ut to cover it. Thanks
Done, but i haven't found a simple way to check the allocate order of tasks, so I made some extensions to TestCommonStruct to enable it to record the actual allocation order. This change is compatible with previous cases. Looking forward to your suggestions
Don't need to do that. Just let the whole resource only enough for allocating first few jobs and check the left jobs are pending.
Hi, please add some ut to cover it. Thanks
Done, but i haven't found a simple way to check the allocate order of tasks, so I made some extensions to TestCommonStruct to enable it to record the actual allocation order. This change is compatible with previous cases. Looking forward to your suggestions
Don't need to do that. Just let the whole resource only enough for allocating first few jobs and check the left jobs are pending.
I have considered doing this before, but due to the same weight for both queues, when I try to fill up all resources with tasks, they will be rejected by the proportion plugin during Overused and Allocable detection. (because we only have two queues, if we want to fill up all resources, must have one queue's resource allocated is more than half). But I just realized that maybe I can disable the Overused and Allocatable checks directly in the plugin's config.
Don't need to do that. Just let the whole resource only enough for allocating first few jobs and check the left jobs are pending.
Add two cases:
- total 5cpus, pod-small-1 use 1cpu, then pod-large-2 use 3cpu, and then pod-large-1 pending, indicate that q-2 is allocated first
- total 5 cpus, pod-small-1 use 1cpu, then pod-large-2 use 2cpu, and then pod-large-1 use 2 cpus, pod-small-2 is pending, indicate that q-2 is first allocated and then q-1 is allocated
Wait #3408 fix the failed UT.
May be we need to adjust the e2e test case.
May be we need to adjust the e2e test case.
let me take a look
@lowang-bh Hi, I seem to have found some issues while searching for the volcano component logs generated by e2e testing.
- when e2e test failed, we will backup volcano component logs by https://github.com/volcano-sh/volcano/blob/c414e56be9d238efd4665ba81d06a535f55dc6b9/hack/run-e2e-kind.sh#L69-L74 we are trying to get log from namespace kube-system but those component are installed in volcano-system actually https://github.com/volcano-sh/volcano/blob/c414e56be9d238efd4665ba81d06a535f55dc6b9/hack/run-e2e-kind.sh#L57
- it seems that we didn't upload the log file at the end of the workflow https://github.com/volcano-sh/volcano/blob/c414e56be9d238efd4665ba81d06a535f55dc6b9/.github/workflows/e2e_scheduling_basic.yaml#L39-L41
- we are trying to get log from namespace kube-system but those component are installed in volcano-system actually
That is a issue, now we use volcano-system. You can file another PR an merge it first. @Monokaix
/retest
May be we need to adjust the e2e test case.
I triggered the test again and it passed. I haven't made any modifications. Have we recently merged any fixes?
May be we need to adjust the e2e test case.
I triggered the test again and it passed. I haven't made any modifications. Have we recently merged any fixes?
No update. Maybe it is randomly failed. We'd better check it.
May be we need to adjust the e2e test case.
I triggered the test again and it passed. I haven't made any modifications. Have we recently merged any fixes?
No update. Maybe it is randomly failed. We'd better check it.
I am unable to reproduce the problem now. I don't have detailed logs of the previous exception cases, and this error is a large process (involving multiple components) that has timed out. I can't determine which part of the problem went wrong without detailed logs :(
@panoswoo please make the ci succeed and so we can make this pr merged.
/retest
@panoswoo: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.
In response to this:
/retest
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@panoswoo please make the ci succeed and so we can make this pr merged.
I will fix it ASAP
/ok-to-test
/retest
/assign @Monokaix @hwdef Please also help to review it. Thanks.
/retest
HI, @panoswoo , You can close and reopen it to trigger the CI.
/lgtm
Why the former e2e are failed?
New changes are detected. LGTM label has been removed.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
To complete the pull request process, please assign lowang-bh
You can assign the PR to them by writing /assign @lowang-bh in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Please squash to one commit: )
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: william-wang
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~pkg/scheduler/OWNERS~~ [william-wang]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/hold
has case not covered.