cloud-provider-openstack icon indicating copy to clipboard operation
cloud-provider-openstack copied to clipboard

[octavia-ingress-controller] Failing to create load balancer using Kubernetes and Gitlab's AutoDevops

Open arthurzenika opened this issue 2 years ago • 8 comments

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

We're trying to use auto-deploy-image helm chart with an Octavia ingress on OVHCloud with https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/octavia-ingress-controller/using-octavia-ingress-controller.md

We're getting the following error :

Failed to create openstack resources for ingress [snip]: failed to prepare the security group for the ingress [snip]:
failed to add tag [snip] to security group [snip]: Bad request with:
[PUT https://network.compute.gra9.cloud.ovh.net/v2.0/security_groups/[snip]/tags/[snip]], error message: 
{"NeutronError": {"type": "InvalidInput", "message": "Invalid input for operation: 
'xxxxxxxxx-xxx-xxxx-xxxxxx-xxx-xxxx-x-xxxxxx_xxxxxx-xxx-xxxx-x-xxxxxx-xxxx-xxxxxx' exceeds maximum length 
of 60.", "detail": ""}}

What are the reasons for this limitation to 60 characters ?

We're trying to see if the identifier generated by the aut-deploy-image helm chart can be stripped down to 60, see discussion on https://gitlab.com/gitlab-org/cluster-integration/auto-deploy-image/-/issues/203

What you expected to happen:

For the resource to be created.

How to reproduce it:

Once we start discussing the issue we'll see if we need to reproduce.

Anything else we need to know?:

Environment:

  • octavia-ingress-controller version: 1.23
  • OpenStack version: stein

arthurzenika avatar Apr 26 '22 15:04 arthurzenika

I think the issue comes from https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/validators/init.py#L240 though I didn't find where the tag length max is , I remember in nova it's 60 ..

so I think the error comes from https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/ingress/controller/controller.go#L694

that means, the name might be too long on ingfullName, clusterName and we hit the 60 limitation

so workaround is to consider decresae the size of those param or we can consider in CPO side check the length and trunk it a little bit

jichenjc avatar Apr 27 '22 02:04 jichenjc

Hi @jichenjc thanks for looking into this.

The news from auto-deploy-image is that changing the limit from 63 to 60 would be a breaking change for all the current users of the solutions (there would be no way to track the previous ids with 3 characters missing). So there is little chance that they will change this. On our side we can try a workaround by forking the helm chart but that requires a lot of maintenance and means that other future users of octavia-ingress-controller combined with gitlab's autodevops will also bump into this error and will be impacted.

The trunking to 63 caracters is documented here : https://gitlab.com/gitlab-org/cluster-integration/auto-deploy-image/-/blob/master/assets/auto-deploy-app/templates/_helpers.tpl#L11 and comes from the DNS FQDN limitation described here : https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-sethostnameasfqdn-field

I hope this limit can be changed on the CPO side.

arthurzenika avatar Apr 27 '22 07:04 arthurzenika

I need read more about the background here, and as I mentioned, this likely comes from the restriction of neutron (openstack) tag size not from CPO side ... from this message we can see the error is try to tag a sec group but failed

failed to add tag [snip] to security group [snip]: Bad request with:
[PUT https://network.compute.gra9.cloud.ovh.net/v2.0/security_groups/[snip]/tags/[snip]], error message: 

so less we can do from CPO side is we provide a workaround to bypass the tag with given param on demand that might help but I need double check and comment later on the other option is to add a microversion to neutron to make the accepted length bigger which takes more time..

jichenjc avatar Apr 27 '22 07:04 jichenjc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 26 '22 08:07 k8s-triage-robot

AFAIK neutron limits tags to size 60. This has been changed on Xena release and increased to 255.

Neutron resource tags can now be 255 characters long, previously resource tags was limited to 60 characters.

From Xena Release Notes

nikParasyr avatar Aug 01 '22 11:08 nikParasyr

ok, then we need use the desired microversion API in this code place.. let's use this issue to track the change

jichenjc avatar Aug 08 '22 01:08 jichenjc

/remove-lifecycle stale

jichenjc avatar Aug 09 '22 01:08 jichenjc

ok, seems neutron doesn't have the microversion but tag limitation is enhanced the issue reported at OpenStack version: stein but the fix is at Xena, so either upgrade openstack version or apply the fix should fix the problem , there's nothing CPO can do here ... @arthurzenika please let me know then we can close this issue

jichenjc avatar Aug 09 '22 01:08 jichenjc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 07 '22 02:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 07 '22 02:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 06 '23 03:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 06 '23 03:01 k8s-ci-robot