volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Add nodeOrder for devices API and implement idle-first fit and best-fit for vgpu

Open archlitchi opened this issue 1 year ago • 10 comments

  1. Add nodeOrder plugin for future scheduling policies(best fit, first fit ,etc..)
  2. Add vgpu monitor and schedule policy related document in how_to_use_vgpu.md

archlitchi avatar Nov 17 '23 08:11 archlitchi

/assign @william-wang @wangyang0616

archlitchi avatar Nov 17 '23 08:11 archlitchi

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign william-wang You can assign the PR to them by writing /assign @william-wang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot avatar Nov 17 '23 08:11 volcano-sh-bot

It is necessary to fix issue#3141. While there are several ways to add device score capability, like in Device interface or in nodeOrder plugin, there isn't a specific scenes that we should consider to support device score individually, so for the future use case, we'd better clarify the specific scenarios before support a new future and avoid over design.

Monokaix avatar Nov 27 '23 12:11 Monokaix

It is necessary to fix issue#3141. While there are several ways to add device score capability, like in Device interface or in nodeOrder plugin, there isn't a specific scenes that we should consider to support device score individually, so for the future use case, we'd better clarify the specific scenarios before support a new future and avoid over design.

It may be too heavy if merged into "nodeOrder" plugin. because, there may be multiple schedule policies(best fit, worst fit, numa-first fit...) to be implemented in the future for the same sharable-devices(like vgpu). User can pick one using arguments.

archlitchi avatar Nov 27 '23 12:11 archlitchi

It is necessary to fix issue#3141. While there are several ways to add device score capability, like in Device interface or in nodeOrder plugin, there isn't a specific scenes that we should consider to support device score individually, so for the future use case, we'd better clarify the specific scenarios before support a new future and avoid over design.

It may be too heavy if merged into "nodeOrder" plugin. because, there may be multiple schedule policies(best fit, worst fit, numa-first fit...) to be implemented in the future for the same sharable-devices(like vgpu). User can pick one using arguments.

Current plugin like binpack has nodeOrder for all dimensions resources, and node score for device scenarios is not so specific, so we can discuss and enhance this when we need it in the future: )

Monokaix avatar Nov 27 '23 12:11 Monokaix

It is necessary to fix issue#3141. While there are several ways to add device score capability, like in Device interface or in nodeOrder plugin, there isn't a specific scenes that we should consider to support device score individually, so for the future use case, we'd better clarify the specific scenarios before support a new future and avoid over design.

It may be too heavy if merged into "nodeOrder" plugin. because, there may be multiple schedule policies(best fit, worst fit, numa-first fit...) to be implemented in the future for the same sharable-devices(like vgpu). User can pick one using arguments.

Current plugin like binpack has nodeOrder for all dimensions resources, and node score for device scenarios is not so specific, so we can discuss and enhance this when we need it in the future: )

binpack means best-fit policy, if i intend to implement multiple policies, shall i put it into nodeOrder plugin?

archlitchi avatar Nov 28 '23 05:11 archlitchi

@Monokaix i put the device-score

It is necessary to fix issue#3141. While there are several ways to add device score capability, like in Device interface or in nodeOrder plugin, there isn't a specific scenes that we should consider to support device score individually, so for the future use case, we'd better clarify the specific scenarios before support a new future and avoid over design.

Adopted, i have put device-score related logic into nodeOrder plugin

archlitchi avatar Nov 29 '23 10:11 archlitchi

Hi, has NPE problem been fixed by ohter pr? if true, please update pr description and squash commit to one first.

Monokaix avatar Dec 27 '23 02:12 Monokaix

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Mar 17 '24 10:03 stale[bot]

@archlitchi: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

volcano-sh-bot avatar Apr 09 '24 02:04 volcano-sh-bot