openyurt
openyurt copied to clipboard
[feature request]`NodePool Governance Capability`: Modify Yurt-Controller-Manager
What would you like to be added: Yurt-Controller-Manager should consider the two situations of node autonomy and NodePool autonomy. It can be describe as:
- If the cluster has Node autonomy enabled, the node status is NotReady, the Pods on the node will not be evicted. This part has been implemented.
- If the cluster has NodePool autonomy enabled, the policy of the cloud controller is as follows:
renew node lease | renew node lease with delegate annotation | don't renew node lease | |
---|---|---|---|
Policy | Node: Ready; Pod: Maintain Endpoints: Maintain |
Node: NotReady Pod: Maintain Endpoints: Maintain |
Node: NotReady Pod: Evited Endponits: Update |
Note that for special cases, when the node pool is completely disconnected, the cloud needs to adopt the same strategy as "renew node lease with delegate annotation", and not evict all pods in the nodepool. It is best to combine these two situations with efficient judgment, so that the implementation of yurt-controller-manager is concise and clear.
Notes: For details, please refer to the proposal: https://github.com/openyurtio/openyurt/pull/772
Why is this needed: As mentioned in the proposal(https://github.com/openyurtio/openyurt/pull/772) "NodePool Autonomy", in the node pool, all yurthubs connected to the cloud will elect a leader, and the leader yurthub will act as a heartbeat proxy to report the heartbeat of disconnected nodes to the cloud. The Yurt-Controller-Manager judges that the logic of the node has changed, and it needs to consider the two situations of node autonomy and NodePool autonomy.
others /kind feature
/assign @gnunu
Progress update: We will re-enable k8s kube-controller-manager's nodelifecycle controller, for less modification of vanilla k8s. From workload perspective, we will sress on pod management instead of node. For this, we can add validating webhook to check pod operation, especially delete in the case of API initiated eviction.
The works done so far:
- CA and server certificates gerneration on init;
- pod delete validation for eviction for node annotated autonomy;
- some tests related to eviction.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.