volcano
volcano copied to clipboard
Fix panic in proportion plugin
We must subtract guaranteed resources from the remaining resources before starting the division. If we don't do this, it's possible to allocate more resources than we have available in the cluster. For instance, in this scenario:
Suppose we have three queues - one with a significant guarantee of resources and two other queues. All three queues have jobs, so they will be handled in 'queueOpts.' If we start dividing resources from queues with no guarantee resources, then we give 'remaining/3' resources to each of them. However, the queue with guaranteed resources may take more than 'remaining/3' resources (it takes the maximum of 'attr.guarantee' and 'deserved'), which can result in a panic in the 'remaining.Sub' function.
This behavior is undefined because the order of elements in the map is random, but I have observed and reproduced this behavior many times. So, I have added a test to capture this behavior, specifically testing the panic without the fix. I have also added the fix.
Also, I'm going to create an issue with stack trace
Welcome @svileex!
It looks like this is your first PR to volcano-sh/volcano 馃帀.
Thank you, and welcome to Volcano. :smiley:
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
To complete the pull request process, please assign thor-wl
You can assign the PR to them by writing /assign @thor-wl in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Issue: https://github.com/volcano-sh/volcano/issues/3155
/assign @thor-wl
Hey, there hasn't been a review since October. Did I forget to do something?
Hey @Thor-wl @lowang-bh see this mr please
/assign @william-wang
@william-wang @Thor-wl check this mr please
/ok-to-test
/retest
@svileex: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.