cluster-api-provider-digitalocean Cluster migration causes LB issues

There is a bug in capdo that only appears once a cluster has been migrated from one capi cluster to another.

capdo sets tags on the control plane nodes and the cloud loadbalancer to designate the nodes as backends to the LB. See code here: https://github.com/kubernetes-sigs/cluster-api-provider-digitalocean/blob/58a45f155798fe774554228af55853760fa7cd09/cloud/services/networking/loadbalancers.go#L68

The problematic part of this is the UID of the cluster object as part of the tag. Even though it's a nice unique source, it's immutable and generated by the kube apiserver.

This is not an issue as long as the cluster is always managed by the same capi cluster. As soon as the cluster is migrated to another capi cluster this UID changes to a new UID created by the new capi cluster. This will work at first since all the capi and capdo object are just copied over and handed off. However as soon as the first control plane node is replaced, the new one comes up with the new tag which then doesn't match what is set on the cloud loadbalancer. This results in traffic not arriving at the backend as they're no longer seen as LB target droplets.

Possible backwards compatible way to solve this: Introduce a verification to check if the LB tag and droplet tag UID part is still equal to the cluster UID. If not, update the tags. This comes with side-effects and may be non-graceful if there isn't already a new node with the new tags. We could do the new droplet first and reconcile with a 2nd iteration to do the LB in order to avoid this.

Jan 18 '23 14:01 gottwald

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 18 '23 15:04 k8s-triage-robot

/remove-lifecycle stale

Apr 18 '23 19:04 gottwald

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 17 '23 19:07 k8s-triage-robot

/remove-lifecycle stale

Jul 18 '23 10:07 gottwald

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 24 '24 18:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Feb 23 '24 18:02 k8s-triage-robot

/remove-lifecycle rotten

Feb 23 '24 23:02 timoreimann

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 24 '24 00:05 k8s-triage-robot

/remove-lifecycle rotten

May 24 '24 07:05 timoreimann

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jun 23 '24 07:06 k8s-triage-robot

/remove-lifecycle rotten

Jun 23 '24 19:06 timoreimann

cluster-api-provider-digitalocean cluster-api-provider-digitalocean copied to clipboard

Cluster migration causes LB issues

cluster-api-provider-digitalocean
cluster-api-provider-digitalocean copied to clipboard