cluster-api-provider-openstack icon indicating copy to clipboard operation
cluster-api-provider-openstack copied to clipboard

Creation of health monitors for Octavia backends which doesn't support this feature

Open jfcavalcante opened this issue 2 years ago • 3 comments

/kind bug

What steps did you take and what happened:

Provision a Cluster with OVN as backend for Octavia.

A Load Balancer with OVN provider was created for the API Server and a monitor was provisioned for it, causing a provisioning status Error.

What did you expect to happen:

CAPO should skip monitor creation for OVN provider.

Anything else you would like to add:

OVN provider doesn't support the creation of health monitors(docs for further info). The actual implementation of getOrCreateMonitor method is sending a request to create a monitor and then checking if the monitor is Implemented by the provider or not, which causes a unwanted behavior on Openstack side.

The LBaaS API health monitor endpoint doesn't implement the status code that CAPO is checking on IsNotImplementedError(), which is 501.

I didn't found a way to check whether an Octavia backend has monitors implemented or not in the API itself. I think a way to go is to enhance this check or check on a list of the providers itself and if the provider is "ovn", for example, we skip the monitor creation.

func (s *Service) getOrCreateMonitor(openStackCluster *infrav1.OpenStackCluster, monitorName, poolID, lbID string) error {

...
	monitor, err = s.loadbalancerClient.CreateMonitor(monitorCreateOpts)
	// Skip creating monitor if it is not supported by Octavia provider
	if capoerrors.IsNotImplementedError(err) {
		record.Warnf(openStackCluster, "SkippedCreateMonitor", "Health Monitor is not created as it's not implemented with the current Octavia provider.")
		return nil
	}


Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): v0.7.3
  • Cluster-API version: v1.4.4
  • Minikube/KIND version: N/A
  • Kubernetes version (use kubectl version):v1.23.10

jfcavalcante avatar Aug 14 '23 03:08 jfcavalcante

I got to unpack this a bit, see inline:

What did you expect to happen:

CAPO should skip monitor creation for OVN provider.

True. The true bug here is that on 501 Not Implemented the CAPO should just ignore creating the health monitor and only log a warning.

Anything else you would like to add:

OVN provider doesn't support the creation of health monitors(docs for further info). The actual implementation of getOrCreateMonitor method is sending a request to create a monitor and then checking if the monitor is Implemented by the provider or not, which causes a unwanted behavior on Openstack side.

It doesn't invalidate your issue, but newer OVN versions do support health monitors. The docs you link are pretty old.

The LBaaS API health monitor endpoint doesn't implement the status code that CAPO is checking on IsNotImplementedError(), which is 501.

I've just tested that it does:

mdulko:openshift-clusters/ $ curl -g -i --cacert "/home/mdulko/.config/openstack/standalone-ca.crt" -X POST https://10.1.8.104:13876/v2.0/lbaas/healthmonitors -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: openstacksdk/0.61.0 keystoneauth1/4.5.0 python-requests/2.28.1 CPython/3.10.12" -H "X-Auth-Token: <snip>" -d '{"healthmonitor": {"pool_id": "dfc549d0-1e0d-4777-a7f4-84393707cab7", "delay": 1, "timeout": 2, "max_retries": 3, "type": "TCP", "admin_state_up": true}}'
HTTP/1.1 501 Not Implemented
Date: Thu, 17 Aug 2023 14:58:03 GMT
Server: Apache
Content-Length: 169
x-openstack-request-id: req-3ee09fff-30fb-40bd-962c-7b4d56d1e82a
Content-Type: application/json

{"faultcode": "Server", "faultstring": "Provider 'ovn' does not support a requested action: This provider does not support creating health monitors.", "debuginfo": null}%   

I think I'm using Train version of OVN Octavia Provider. Let's make sure we're on the same page here, maybe it's just the code comparing the exception is wrong. Had you checked that?

I didn't found a way to check whether an Octavia backend has monitors implemented or not in the API itself. I think a way to go is to enhance this check or check on a list of the providers itself and if the provider is "ovn", for example, we skip the monitor creation.

Yep, there's no safe way to check that other than just making a call and checking if it's 501.

dulek avatar Aug 17 '23 15:08 dulek

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 26 '24 14:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 25 '24 15:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 26 '24 16:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 26 '24 16:03 k8s-ci-robot