apis icon indicating copy to clipboard operation
apis copied to clipboard

Support task level network topology constrain

Open 3sunny opened this issue 1 month ago • 3 comments

API change for https://github.com/volcano-sh/volcano/issues/4188

  1. Add partitionPolicy spec for vc job.
  2. Add bunchPolicy spec for podGroup.
  3. Auto generation API client codes.

3sunny avatar Oct 14 '25 06:10 3sunny

Welcome @3sunny!

It looks like this is your first PR to volcano-sh/apis.

Thank you, and welcome to Volcano. :smiley:

volcano-sh-bot avatar Oct 14 '25 06:10 volcano-sh-bot

Summary of Changes

Hello @3sunny, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends Volcano's scheduling capabilities by introducing new API fields and types. It enables users to specify network topology constraints at a more granular task level and provides a flexible "bunch policy" for grouping pods within a pod group. These enhancements are crucial for optimizing job placement and resource utilization in complex, distributed environments, particularly where network locality is a key performance factor.

Highlights

  • Task-level Network Topology Constraints: Introduced a new PartitionPolicySpec within the TaskSpec to allow for defining network topology constraints at the individual task level. This includes fields for TotalPartitions, PartitionSize, and NetworkTopology.
  • Pod Grouping Policy (BunchPolicy): Added a BunchPolicy field to the PodGroupSpec in the scheduling APIs. This new policy enables more granular grouping of pods within a pod group based on MatchPolicy (using LabelKey) and BunchSize, with optional NetworkTopology settings for each defined bunch.
  • New Action for Partition Restarts: A RestartPartitionAction has been introduced in the bus API, providing the capability to restart an entire partition group of pods, which involves deleting and recreating them.
  • New Partition Identifier Label: A new constant, Partitionkey (volcano.sh/partition-id), was added to labels.go to support the new partitioning logic.
  • API Boilerplate Updates: Generated deepcopy, conversion, and apply configuration files have been updated across various API versions (v1alpha1, v1beta1) to correctly reflect and support the newly introduced API structures and fields.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Oct 14 '25 06:10 gemini-code-assist[bot]

Can you providCould you provide a YAML example for PodGroups and vcJob after the API changes?

Good suggestion, I think we can add some example YAMLs in the main repo

JesseStutler avatar Nov 05 '25 09:11 JesseStutler

Can you providCould you provide a YAML example for PodGroups and vcJob after the API changes?

The modified examples of PodGroup and vcJob have been presented in the commit for your reference.

3sunny avatar Nov 06 '25 01:11 3sunny

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: network-topology-podgroup
spec:
  minMember: 6
  networkTopology:
    mode: hard 
    highestTierAllowed: 2
  bunchPolicy: 
    - bunchSize: 3
      name: task
      matchPolicy:
        - labelKey: volcano.sh/task-bunch-id
      networkTopology:
        mode: hard 
        highestTierAllowed: 1

In this example yaml, minMember is 6, but bunch size only 3. Is this example complete?

LiZhenCheng9527 avatar Nov 12 '25 02:11 LiZhenCheng9527

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: network-topology-podgroup
spec:
  minMember: 6
  networkTopology:
    mode: hard 
    highestTierAllowed: 2
  bunchPolicy: 
    - bunchSize: 3
      name: task
      matchPolicy:
        - labelKey: volcano.sh/task-bunch-id
      networkTopology:
        mode: hard 
        highestTierAllowed: 1

In this example yaml, minMember is 6, but bunch size only 3. Is this example complete?

minMember indicates the minimum number of pods required for a PodGroup, and bunchSize indicates the number of pods required for a bunch. This PodGroup example means that two bunches need to be started to run: each bunch’s pods are required to be affinity at the affinity level specified by the bunch, and the pods between multiple bunches are required to meet the overall networkTopology affinity level.

3sunny avatar Nov 12 '25 06:11 3sunny

@3sunny Please squash your commits into one, I think I'm ok with current api design

JesseStutler avatar Nov 14 '25 01:11 JesseStutler

@3sunny Please squash your commits into one, I think I'm ok with current api design

done

3sunny avatar Nov 14 '25 01:11 3sunny

/lgtm /approve

wangyang0616 avatar Nov 14 '25 02:11 wangyang0616

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JesseStutler, wangyang0616

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot avatar Nov 14 '25 02:11 volcano-sh-bot