kubespray icon indicating copy to clipboard operation
kubespray copied to clipboard

Fix calico vxlan tunnel resilience on ansible run

Open MatthieuFin opened this issue 1 year ago • 6 comments
trafficstars

What type of PR is this? /kind bug

What this PR does / why we need it: When I run kubespray on existing cluster with calico cni, bird backend and vxlan tunnels, vxlan tunnel are dropped because calicoctl apply

Which issue(s) this PR fixes:

Fixes #11096

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

MatthieuFin avatar Apr 18 '24 13:04 MatthieuFin

Hi @MatthieuFin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 18 '24 13:04 k8s-ci-robot

/ok-to-test

yankay avatar Apr 19 '24 04:04 yankay

Hello, I have the habit of managing the deployment of calico with kubespray. I suffered 2 outages due to this issue, the first was to change the requested resources of calico deployment and the 2nd was to upgrade calico.

Run kubespray to upgrade calico and kubespray version permit to manage rbac deployment per example, especially split of rbac with introduction of clusterrole "calico-cni-plugin" with calico version 3.26 in that case.

I wanna prevent the case where someone run kubespray tags network and broken my vxlan network.

The goal of this PR is to ensure that task is idempotent.

MatthieuFin avatar Apr 26 '24 20:04 MatthieuFin

@MatthieuFin thanks for the details, the changes look good to me, Are you testing your changes?

cyclinder avatar Apr 30 '24 07:04 cyclinder

Hi, yes I tested the changes, that's the workaround that I use on my production clusters. I have also tested them on new fresh cluster creation and they seem fully backward compatible.

MatthieuFin avatar Apr 30 '24 21:04 MatthieuFin

Thanks @MatthieuFin /approve

yankay avatar May 06 '24 03:05 yankay

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cyclinder, MatthieuFin, yankay

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar May 06 '24 03:05 k8s-ci-robot