cluster-api-provider-openstack icon indicating copy to clipboard operation
cluster-api-provider-openstack copied to clipboard

Failed to associate floating IP because that fixed IP already has a floating IP

Open SimoneRoma opened this issue 2 years ago • 4 comments

/kind bug

What steps did you take and what happened: We are using the following spec to create the cluster.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha5
kind: OpenStackCluster
metadata:
  name: test
  namespace: default
spec:
  apiServerLoadBalancer:
    enabled: true
  cloudName: openstack
  dnsNameservers:
  - 8.8.8.8
  externalNetworkId: 3194ac65-9040-4e3f-9f2d-95d704215ee7
  identityRef:
    kind: Secret
    name: test-cloud-config
  managedSecurityGroups: false
  nodeCidr: 10.6.0.0/24

The load balancer is properly created. When it comes to associate a floating ip, the request is satisfied by neutron, but it takes too much time and the request from capo reaches the timeout. Capo issues a new request and is getting this error:

events: Created floating IP 10.254.71.159 with id 34f8b8e7-5fc3-441b-adbc-cce602e11c12" type="Normal" object={Kind:OpenStackCluster Namespace:cmp-system Name:65085ecbb23e7eac2cf85380--oscl UID:3d9cc38c-fc54-4e7f-960b-04eaaffb0e91 APIVersion:infrastructure.cluster.x-k8s.io/v1alpha6 ResourceVersion:182651750 FieldPath:} reason="Successfulcreatefloatingip" I0918 14:34:26.087253 1 recorder.go:103] "events: Failed to associate floating IP 10.254.71.159 with port 82684f3f-2696-4eda-a927-982dc87dbbd8: Expected HTTP response code [200] when accessing [PUT https://10.214.106.253:9696/v2.0/floatingips/34f8b8e7-5fc3-441b-adbc-cce602e11c12], but got 409 instead\n{"NeutronError": {"type": "FloatingIPPortAlreadyAssociated", "message": "Cannot associate floating IP 10.254.71.159 (34f8b8e7-5fc3-441b-adbc-cce602e11c12) with port 82684f3f-2696-4eda-a927-982dc87dbbd8 using fixed IP 10.7.2.220, as that fixed IP already has a floating IP on external network dd3ec40e-bc12-4289-83f2-6919abcbf7b6.", "detail": ""}}" type="Warning" object={Kind:OpenStackCluster Namespace:cmp-system Name:65085ecbb23e7eac2cf85380--oscl UID:3d9cc38c-fc54-4e7f-960b-04eaaffb0e91 APIVersion:infrastructure.cluster.x-k8s.io/v1alpha6 ResourceVersion:182651750 FieldPath:} reason="Failedassociatefloatingip"

The provisioning never completes, and a lot of IP addresses remain orphaned in the project.

What did you expect to happen: Capo should check if a floting ip is already attached, and eventually use that one.

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): 0.7.2 and 0.8.0
  • Cluster-API version: 1.4.5
  • OpenStack version: Zed
  • Minikube/KIND version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release): ubuntu 22.04

SimoneRoma avatar Nov 08 '23 10:11 SimoneRoma

When it comes to associate a floating ip, the request is satisfied by neutron, but it takes too much time and the request from capo reaches the timeout.

looks like 2 problem exist?

  • the default timeout seems not good enough for your case?
  • we didn't handle the timeout well as it's really associated but in such error case we should deassociate first to ensure no floating ip to the fix ip..

jichenjc avatar Nov 09 '23 03:11 jichenjc

Default timeout works defenitily well most of times. It can happen, however, that due to underneath problem at openstack (mainly on rabbitmq) this issue could arise. The major problem here is that if capo keep taking IPs (sometimes also precious public IPs) we are going to exaust floting ip quota inside openstack projects. That said, i agree with you: capo shuold be able to deassociate first to ensure no floating ip to the fix ip.

SimoneRoma avatar Nov 09 '23 07:11 SimoneRoma

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 07 '24 07:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Mar 08 '24 08:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Apr 07 '24 08:04 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 07 '24 08:04 k8s-ci-robot