cloud-provider-openstack icon indicating copy to clipboard operation
cloud-provider-openstack copied to clipboard

[occm] Improve route controller reconciling to ensure the cluster's nodes can access each other

Open jeffyjf opened this issue 1 year ago • 18 comments

What this PR does / why we need it:

In order to the nodes of one cluster can access each other, route controller need to check and set three things:

  1. The openstack router's route rules, so that the packets can be forwarded to correct nodes.
  2. The node port's AllowAddressPair, to ensure the node permit the packets that access the node's pods/services pass through.
  3. The openstack security group's rules, so that the nodes that bind the security group permit the packets from other nodes enter into.

The current codes just check router's route rule when controller call ListRoutes and just set Route and AllowAddressPair when controller call CreateRoute. This PR complements all of the other works.

Which issue this PR fixes(if applicable): fixes #2482

Special notes for reviewers:

This PR lack of unit tests, because of I found occm lack of a mechanism to mock openstack client. Further more the whole occm lack of lots of unit tests due to the reason. I plan to dive deeper into gophercloud in the next several days try to study whether it is possible to mock it. If it is possible I'll commit anohter PR to add the unit tests.

Release note:

NONE

jeffyjf avatar Nov 28 '23 10:11 jeffyjf

Hi @jeffyjf. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 28 '23 10:11 k8s-ci-robot

Why did this ended up in Routes controller? The issue description feels like this is a general issue in cases CAPO is missing SG rules to allow interconnectivity of Node.

dulek avatar Nov 28 '23 16:11 dulek

Why did this ended up in Routes controller? The issue description feels like this is a general issue in cases CAPO is missing SG rules to allow interconnectivity of Node.

According to the offical document:

The route controller is responsible for configuring routes in the cloud appropriately so that containers on different nodes in your Kubernetes cluster can communicate with each other.

I think that this is route controller's duty. I user don't activate route controller, the interconnectivity of pods should be ensured by other mechanisms.

jeffyjf avatar Nov 29 '23 00:11 jeffyjf

/ok-to-test

jichenjc avatar Nov 29 '23 01:11 jichenjc

@dulek @jichenjc @kayrus @zetaab asking for review

jeffyjf avatar Dec 13 '23 02:12 jeffyjf

The PR needs a rebase. However the #2499 is on the way, and I adopted your getSecurityGroupRules change there as well.

kayrus avatar Dec 13 '23 12:12 kayrus

/hold

jeffyjf avatar Dec 14 '23 01:12 jeffyjf

/remove hold

jeffyjf avatar Dec 14 '23 03:12 jeffyjf

/remove-hold

jeffyjf avatar Dec 14 '23 03:12 jeffyjf

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign kayrus for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jan 15 '24 05:01 k8s-ci-robot

/retest pull-cloud-provider-openstack-test

jeffyjf avatar Jan 15 '24 07:01 jeffyjf

@jeffyjf: The /retest command does not accept any targets. The following commands are available to trigger required jobs:

  • /test openstack-cloud-controller-manager-e2e-test
  • /test openstack-cloud-controller-manager-ovn-e2e-test
  • /test openstack-cloud-csi-cinder-e2e-test
  • /test openstack-cloud-csi-cinder-sanity-test
  • /test openstack-cloud-csi-manila-e2e-test
  • /test openstack-cloud-csi-manila-sanity-test
  • /test openstack-cloud-keystone-authentication-authorization-test
  • /test pull-cloud-provider-openstack-check
  • /test pull-cloud-provider-openstack-test

Use /test all to run the following jobs that were automatically triggered:

  • openstack-cloud-controller-manager-e2e-test
  • openstack-cloud-controller-manager-ovn-e2e-test
  • pull-cloud-provider-openstack-check
  • pull-cloud-provider-openstack-test

In response to this:

/retest pull-cloud-provider-openstack-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 15 '24 07:01 k8s-ci-robot

/test pull-cloud-provider-openstack-test

jeffyjf avatar Jan 15 '24 07:01 jeffyjf

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 14 '24 07:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar May 14 '24 07:05 k8s-triage-robot

/remove-lifecycle rotten /lgtm

dulek avatar May 23 '24 09:05 dulek

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 03 '24 00:07 k8s-ci-robot