volcano icon indicating copy to clipboard operation
volcano copied to clipboard

docs: overcommit-plugin enhancements

Open googs1025 opened this issue 1 year ago • 10 comments

Fix: https://github.com/volcano-sh/volcano/issues/3635

overcommit-plugin

@googs1025; Jul. 29, 2024

Background:

The overcommit-plugin is used to amplify node resources to achieve resource allocation.

Objective:

Use different amplification factors based on different resource types.

Introduction

Currently, the overcommit-plugin enhances the Allocatable resources of a node to achieve the functionality of AddJobEnqueuedFn. However, different resources should have different factors, so using the same overcommit-factor is not appropriate. a

  • For example:

The Binpack plugin assigns different weights to different resources as well.

actions: "enqueue, reclaim, allocate, backfill, preempt"
tiers:
- plugins:
  - name: binpack
    arguments:
      binpack.weight: 10
      binpack.cpu: 5
      binpack.memory: 1
      binpack.resources: nvidia.com/gpu, example.com/foo
      binpack.resources.nvidia.com/gpu: 2
      binpack.resources.example.com/foo: 3

Solution

We can further break down the overcommit-factor into more granular components: overcommit-factor.<resource name>.

For example: overcommit-factor.cpu overcommit-factor.memory overcommit-factor.pods overcommit-factor.ephemeral-storage overcommit-factor.nvidia.com/gpu

To maintain compatibility with the existing approach, we will retain the original overcommit-factor field and we will keep the original overcommit-factor field and introduce an optional field of overcommit-factor.<resource name>.

factors

The priority of these fields will be from low to high:

defaultOverCommitFactor -> overcommit-factor -> overcommit-factor.<resorce name>

  • overcommitPlugin struct
// overcommitFactors defines the resource overCommit factors
type overcommitFactors struct {
	// factorMaps defines the resource overCommit factors
    // key: resource, example: "cpu", "memory", "ephemeral-storage", "nvidia.com/gpu"
    // value: overCommit factors
    factorMaps map[string]float64
}

type overcommitPlugin struct {
    // Arguments given for the plugin
    pluginArguments  framework.Arguments
    totalResource    *api.Resource
    idleResource     *api.Resource
    inqueueResource  *api.Resource
    // overCommitFactor is the different resource overCommit factors
    overCommitFactors *overcommitFactors
}

Example

Example 1: Explicitly specify all the overcommit factors

actions: "enqueue, allocate, backfill"
tiers:
- plugins:
  - name: overcommit
    arguments:
    overcommit-factor.cpu: 1.2
    overcommit-factor.memory: 1.0
    overcommit-factor.ephemeral-storage: 1.2
    overcommit-factor.pods: 1.2
    overcommit-factor.nvidia.com/gpu: 1.2

Example 2: Specifying only the overcommit-factor implies that all factors are the same.

actions: "enqueue, allocate, backfill"
tiers:
- plugins:
  - name: overcommit
    arguments:
    overcommit-factor: 1.3

Example 3: Specifying overcommit-factor.cpu, overcommit-factor.nvidia.com/gpu are set, along with specifying overcommit-factor: indicates that the resource uses a specific value, while other values use the overcommit-factor field.

actions: "enqueue, allocate, backfill"
tiers:
- plugins:
  - name: overcommit
    arguments:
    overcommit-factor.cpu: 1.2
    overcommit-factor.nvidia.com/gpu: 1.3
    overcommit-factor: 1.0

Example 4: Specifying any one of overcommit-factor.cpu is set: indicates that the resource uses a specific value, while other values use the defaultOverCommitFactor default value.

actions: "enqueue, allocate, backfill"
tiers:
- plugins:
  - name: overcommit
    arguments:
    overcommit-factor.cpu: 1.2

Example 5: Not specifying will default to the defaultOverCommitFactor value.

actions: "enqueue, allocate, backfill"
tiers:
- plugins:
  - name: overcommit
    arguments:

googs1025 avatar Jul 27 '24 13:07 googs1025

Why not put it in the overcommit document, but create a new document?

hwdef avatar Jul 28 '24 09:07 hwdef

Why not put it in the overcommit document, but create a new document?

I would like to, but I can't seem to find any design documentation related to overcommit plugins

googs1025 avatar Jul 28 '24 09:07 googs1025

You can name your document overcommit-plugin.md

hwdef avatar Jul 28 '24 10:07 hwdef

enhancements

I can modify it. I named it overcommit-plugin-enhancements because overcommit-plugin itself was not designed by me and I may not be sure of many backgrounds.

googs1025 avatar Jul 28 '24 13:07 googs1025

/kind docs /kind feature

googs1025 avatar Jul 29 '24 00:07 googs1025

enhancements

I can modify it. I named it overcommit-plugin-enhancements because overcommit-plugin itself was not designed by me and I may not be sure of many backgrounds.

Never mind, overcommit plugin is simple, and I believe you can understand it completely. This is also a supplement to the missing documentation in the community.

hwdef avatar Jul 29 '24 03:07 hwdef

enhancements

I can modify it. I named it overcommit-plugin-enhancements because overcommit-plugin itself was not designed by me and I may not be sure of many backgrounds.

Never mind, overcommit plugin is simple, and I believe you can understand it completely. This is also a supplement to the missing documentation in the community.

thanks! done

googs1025 avatar Jul 29 '24 11:07 googs1025

If having time, please take a look at this issue. @lowang-bh @Monokaix thanks a lot

googs1025 avatar Jul 29 '24 12:07 googs1025

Please refer to predicate.Proportional function in predicate plugin.

lowang-bh avatar Aug 01 '24 13:08 lowang-bh

@Monokaix @JesseStutler /PTAL thanks

googs1025 avatar Dec 08 '24 02:12 googs1025

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Apr 25 '25 23:04 stale[bot]

still need

hwdef avatar May 05 '25 07:05 hwdef

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hwdef

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot avatar Aug 25 '25 09:08 volcano-sh-bot