openyurt icon indicating copy to clipboard operation
openyurt copied to clipboard

[feature request]`NodePool Governance Capability`: Modify Yurt-Controller-Manager

Open Peeknut opened this issue 2 years ago • 3 comments

What would you like to be added: Yurt-Controller-Manager should consider the two situations of node autonomy and NodePool autonomy. It can be describe as:

  • If the cluster has Node autonomy enabled, the node status is NotReady, the Pods on the node will not be evicted. This part has been implemented.
  • If the cluster has NodePool autonomy enabled, the policy of the cloud controller is as follows:
  renew node lease renew node lease with delegate annotation don't renew node lease
Policy Node: Ready;
Pod: Maintain
Endpoints: Maintain
Node: NotReady
Pod: Maintain
Endpoints: Maintain
Node: NotReady
Pod: Evited
Endponits: Update

Note that for special cases, when the node pool is completely disconnected, the cloud needs to adopt the same strategy as "renew node lease with delegate annotation", and not evict all pods in the nodepool. It is best to combine these two situations with efficient judgment, so that the implementation of yurt-controller-manager is concise and clear.

Notes: For details, please refer to the proposal: https://github.com/openyurtio/openyurt/pull/772

Why is this needed: As mentioned in the proposal(https://github.com/openyurtio/openyurt/pull/772) "NodePool Autonomy", in the node pool, all yurthubs connected to the cloud will elect a leader, and the leader yurthub will act as a heartbeat proxy to report the heartbeat of disconnected nodes to the cloud. The Yurt-Controller-Manager judges that the logic of the node has changed, and it needs to consider the two situations of node autonomy and NodePool autonomy.

others /kind feature

Peeknut avatar Mar 10 '22 07:03 Peeknut

/assign @gnunu

gnunu avatar Mar 11 '22 08:03 gnunu

Progress update: We will re-enable k8s kube-controller-manager's nodelifecycle controller, for less modification of vanilla k8s. From workload perspective, we will sress on pod management instead of node. For this, we can add validating webhook to check pod operation, especially delete in the case of API initiated eviction.

The works done so far:

  1. CA and server certificates gerneration on init;
  2. pod delete validation for eviction for node annotated autonomy;
  3. some tests related to eviction.

gnunu avatar May 24 '22 09:05 gnunu

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 30 '22 23:08 stale[bot]