liqo
liqo copied to clipboard
Restrict pod to be recreated on Host/Local cluster in case Member Cluster becomes inaccessible/disconnected
What happened :
When Member/Remote cluster becomes inaccessible or disconnected from Host cluster all the scheduled pods get recreated to Host Cluster.
What you expected to happen:
Incase Member Cluster becomes inaccessible or disconnected then pods which are scheduled on Member Cluster should not be recreated on Host cluster. How I can restrict that recreation of pods on Host cluster?
How to reproduce it (as minimally and precisely as possible):
- Create Host and Member Cluster
- Create connection between both the Cluster
- Deploy application to the Member Cluster from Host Cluster
- Disconnect Member cluster
- All scheduled pod gets recreated to the Host Cluster
Anything else we need to know?:
Environment:
- Liqo version: v0.4.0
- Kubernetes version (use
kubectl version
): v1.21.9 - Cloud provider or hardware configuration: AKS
- Network plugin and version:
- Install tools:
- Others:
Hi @agulhane-tibco,
It depends on what you mean by a cluster becomes unreachable/disconnected.
- In case you disconnect two clusters from the liqo point of view (i.e., remove the peering, deleting the ForeignCluster resource or editing its spec), than all pods that were offloaded on that cluster get evicted and rescheduled in another location, as the virtual node has been deleted.
- In case of temporary disconnection due to network connectivity problems, the virtual kubelet notices that and marks the virtual node as no longer ready. This prevents the scheduling of new pods until the connectivity is restored, and triggers the standard kubernetes policies for pod eviction (by default approximately after 5 minutes). If you do not want the pods to be evicted, you need to modify the pod toleration to not ready nodes, as you would do for a standard Kubernetes node (alternatively, you can also completely disable the ping checks performed by the virtual kubelet through an appropriate flag, if you do not care about that information).
Thank you so much Marco for the response. However, is there any way that we can stop rescheduling the pods on Host cluster in case I disconnect two cluster from the LIQO point of you?
At the moment no, since all remote resources are removed upon cluster disconnection. The fact is that, even in case offloaded pods were kept there, the corresponding deployment remains local (still, we use a custom remote representation - ShadowPod - to ensure pods are correctly restarted if a remote node is no longer ready), and you would lack the possibility to control it directly from the remote cluster.
Thank you so much Marco for clarification.