cluster-api-provider-digitalocean icon indicating copy to clipboard operation
cluster-api-provider-digitalocean copied to clipboard

Cluster migration causes LB issues

Open gottwald opened this issue 2 years ago • 13 comments

There is a bug in capdo that only appears once a cluster has been migrated from one capi cluster to another.

capdo sets tags on the control plane nodes and the cloud loadbalancer to designate the nodes as backends to the LB. See code here: https://github.com/kubernetes-sigs/cluster-api-provider-digitalocean/blob/58a45f155798fe774554228af55853760fa7cd09/cloud/services/networking/loadbalancers.go#L68

The problematic part of this is the UID of the cluster object as part of the tag. Even though it's a nice unique source, it's immutable and generated by the kube apiserver.

This is not an issue as long as the cluster is always managed by the same capi cluster. As soon as the cluster is migrated to another capi cluster this UID changes to a new UID created by the new capi cluster. This will work at first since all the capi and capdo object are just copied over and handed off. However as soon as the first control plane node is replaced, the new one comes up with the new tag which then doesn't match what is set on the cloud loadbalancer. This results in traffic not arriving at the backend as they're no longer seen as LB target droplets.

Possible backwards compatible way to solve this: Introduce a verification to check if the LB tag and droplet tag UID part is still equal to the cluster UID. If not, update the tags. This comes with side-effects and may be non-graceful if there isn't already a new node with the new tags. We could do the new droplet first and reconcile with a 2nd iteration to do the LB in order to avoid this.

gottwald avatar Jan 18 '23 14:01 gottwald

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 18 '23 15:04 k8s-triage-robot

/remove-lifecycle stale

gottwald avatar Apr 18 '23 19:04 gottwald

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 17 '23 19:07 k8s-triage-robot

/remove-lifecycle stale

gottwald avatar Jul 18 '23 10:07 gottwald

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 24 '24 18:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 23 '24 18:02 k8s-triage-robot

/remove-lifecycle rotten

timoreimann avatar Feb 23 '24 23:02 timoreimann

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 24 '24 00:05 k8s-triage-robot

/remove-lifecycle rotten

timoreimann avatar May 24 '24 07:05 timoreimann

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jun 23 '24 07:06 k8s-triage-robot

/remove-lifecycle rotten

timoreimann avatar Jun 23 '24 19:06 timoreimann