cloud-provider-openstack icon indicating copy to clipboard operation
cloud-provider-openstack copied to clipboard

[occm] Creating a fully populated LB when Octavia uses ovn-provider may leave pools in PENDING_CREATE

Open mdbooth opened this issue 3 years ago • 9 comments

/kind bug

What happened: This is primary a placeholder and a link to https://bugzilla.redhat.com/show_bug.cgi?id=2042976

We think that OpenStack CCM may be affected by the same issue. While we will work to fix the problem in Octavia, it may be prudent to include a workaround in occm until it's widely deployed.

We haven't verified this at all, although it looks like occm would trigger the same issue: https://github.com/kubernetes/cloud-provider-openstack/blob/ee4ecca326219153805b09ccc95911efd93f58f0/pkg/openstack/loadbalancer.go#L569-L658

We'll try to update this issue when we have a more information.

mdbooth avatar Jan 20 '22 17:01 mdbooth

Thanks for reporting this issue @mdbooth !

OCCM supports amphora octavia driver by default and OVN driver support policy is "do the best but not guarantee", we'd rather the issue is fixed in Octavia side first.

lingxiankong avatar Jan 21 '22 09:01 lingxiankong

I think this might be slightly higher priority for us as I think OVN might be the default in OSP now? Somebody will correct me if I'm wrong.

We're certainly going to fix this in Octavia. My concern is OpenStack updates tend to roll out considerably slower than K8S updates, so we will inevitably have users trying to deploy with OCCM before the Octavia fix is widely deployed. If possible we'd also like to workaround the issue in OCCM until that happens.

Incidentally we're just starting to ramp up our efforts to transition OpenShift on OpenStack to use OCCM by default in the near future, and we're obviously going to back that up with development and maintenance. My hope is that we can upgrade that 'do the best but not guarantee' ourselves by working on it. We're certainly intending to do the work for this issue specifically ourselves.

mdbooth avatar Jan 21 '22 13:01 mdbooth

I think this might be slightly higher priority for us as I think OVN might be the default in OSP now? Somebody will correct me if I'm wrong.

OCCM doesn't do autodetection of available providers, so by default it will use Amphora regardless. You need to configure OVN manually. That being said, OVN-provider is superior in terms resources conservation and time to bring the LB up.

We're certainly going to fix this in Octavia. My concern is OpenStack updates tend to roll out considerably slower than K8S updates, so we will inevitably have users trying to deploy with OCCM before the Octavia fix is widely deployed. If possible we'd also like to workaround the issue in OCCM until that happens.

The fix is upstream now: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/826257/, but I agree with @mdbooth view here, it'll be a while before we'll see it in the field.

Incidentally we're just starting to ramp up our efforts to transition OpenShift on OpenStack to use OCCM by default in the near future, and we're obviously going to back that up with development and maintenance. My hope is that we can upgrade that 'do the best but not guarantee' ourselves by working on it. We're certainly intending to do the work for this issue specifically ourselves.

dulek avatar Jan 25 '22 15:01 dulek

If possible we'd also like to workaround the issue in OCCM until that happens.

I knew a lot of openstack vendors/users do less upgrade due to the private patches and codes so I think it's likely the changes to latest openstack won't help them a lot

My hope is that we can upgrade that 'do the best but not guarantee' ourselves by working on it.

+1 to this if OCCM can help fix or at least mitigate the problem

jichenjc avatar Feb 07 '22 06:02 jichenjc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 08 '22 06:05 k8s-triage-robot

/remove-lifecycle stale

dulek avatar May 09 '22 08:05 dulek

I have same problem on Openstack Xena Version with octavia amphora provider. NOK : Kubernetes 1.23, OCCM 1.23, Openstack Xena OK : Kubernetes 1.23, OCCM 1.23, Openstack Victoria

I currently check octavia Logs, if i try to create LB manually from openstack GUI, it's create LB and send creation of pool/pool member/listener after. If i launch creation of LB from K8S, Logs only receive request for LB.

Shugotekfr avatar May 10 '22 20:05 Shugotekfr

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 08 '22 20:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 07 '22 21:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Oct 07 '22 21:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 07 '22 21:10 k8s-ci-robot