Support identifying network topology from node labels and converted into hyperNode resources
Which issue(s) this PR fixes:
https://github.com/volcano-sh/volcano/issues/4145
What type of PR is this?
Add a controller to manage the automatic creation of hypernodes.
The workflow of this controller can be summarized in the following steps:
-
Configuration Loading The controller loads the configuration from a specific ConfigMap named
volcano-networktopologyaware-configuration, which includes the key-value pairs for the node labels to be recognized. -
Periodic Sync and Hypernode Resource Check At regular intervals(1s default), the controller synchronizes and checks the current state of Hypernode resources to ensure they align with the expected configuration based on the node labels and the ConfigMap.
-
Automatic Hypernode Creation or Adjustment If the controller finds that the Hypernodes do not match the expected configuration (for example, a required Hypernode is missing or misconfigured), it automatically creates, deletes, or adjusts the Hypernode resources to bring them into compliance with the desired state.
Example: Using the Controller for Automatic Hypernode Creation:
- To configure the node label keys should be recognized, define them through the node_label_based_topologies_override values in the values.yaml file of charts. The configuration must always follow a strict hierarchy, from the lowest layer to the highest layer. For example:
- name: topology1
topo-level-keys:
- network-tier1
- network-tier2
- network-tier3
This configuration tells the controller to look for nodes with the label keys network-tier1,network-tier2,network-tier3
Warning: Ensure the order of the keys is correct. Incorrect ordering of the keys will generate a topology that does not match your expectations.
- Assume you have k8s nodes with following labels:
lables:
network-tier1: s0
network-tier2: s4
network-tier3: s6
nodes with following labels in this hypernode network with 3 tiers:
- Volcano will automatically create hypernode resources
topology1-t1-s0-xxxx, topology1-t1-s1-xxxx, topology1-t1-s2-xxxx, topology1-t1-s3-xxxx, topology1-t2-s4-xxxx, topology1-t2-s5-xxxx, topology1-t3-s7-xxxxExample:hypernode7
apiVersion: topology.volcano.sh/v1alpha1
kind: HyperNode
metadata:
labels:
volcano.io/label-based-topology-name: topology1
name: topology1-t3-s6-xxxx
spec:
members:
- selector:
exactMatch:
name: topology1-t2-s4-xxxx
type: HyperNode
- selector:
exactMatch:
name: topology1-t2-s5-xxxx
type: HyperNode
tier: 3
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
To complete the pull request process, please assign shinytang6
You can assign the PR to them by writing /assign @shinytang6 in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi,please add more details: )
Need to add a design doc
@Lily922: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
auto discovery semantic seems not so proper, auto conversion seems better.