calico icon indicating copy to clipboard operation
calico copied to clipboard

Match full interface names to exclude

Open neoaggelos opened this issue 2 years ago • 11 comments

Description

Fix bug where Calico first-found IP autodetection method ignores valid interfaces.

DEFAULT_INTERFACES_TO_EXCLUDE is a list of regular expressions for interface names to ignore when the IP autodetection method is set to first-found. These are mostly virtual interfaces, which are safe to ignore.

A lurking issue is that regexp.MatchString() returns true even for partial string matches. This means that any interface including these string parts will be matched and ignored. This is problematic for ppc64le, since a standard interface naming scheme is ibmveth{a,b...}.

In my attempt to preserve the spirit of the original author, I've changed the regular expressions to match at the start of the interface name only. This PR targets calico-node, and fixed the issue for me (test image for ppc64le is cdkbot/calico-node:v3.23.3-dev1-ppc64le)

Other options:

  • Add $ at the end of all strings to make the intent clear.
  • Only change veth.* to ^veth.*$
  • Only change veth.* to ^!(ibm)veth.*$ (not sure about this regex, basically match veth.* but not ibmveth.*).

Related issues/PRs

References https://github.com/canonical/microk8s/issues/3458

Todos

  • [ ] Tests (it is not straightforward
  • [ ] Documentation
  • [ ] Release note

Release Note

Match full interface names in IP auto-detection default exclude list.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

neoaggelos avatar Sep 24 '22 15:09 neoaggelos

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 24 '22 15:09 CLAassistant

Hi, I would appreciate some feedback on this.

neoaggelos avatar Sep 27 '22 14:09 neoaggelos

pinging @caseydavenport for feedback?

lwr20 avatar Sep 27 '22 14:09 lwr20

bump; @caseydavenport any comments?

neoaggelos avatar Oct 04 '22 12:10 neoaggelos

Sorry for the delay - I missed the notifications.

This seems reasonable to me. It has a small chance of breaking someone relying on the current behavior, but I think this classifies properly as a bug and is unlikely to cause much harm.

caseydavenport avatar Oct 10 '22 21:10 caseydavenport

/sem-approve

caseydavenport avatar Oct 10 '22 21:10 caseydavenport

It would be great to see a unit test to go with this change that would fail with the old code but passes with this commit.

caseydavenport avatar Oct 10 '22 21:10 caseydavenport

@caseydavenport Added some unit tests, I've only slightly refactored autodetection.GetInterfaces to accomodate for this, since the hardcoded net.Interfaces() call there made things quite difficult to test.

Hope the test case is sufficient, happy to add more if needed.

neoaggelos avatar Oct 13 '22 14:10 neoaggelos

/sem-approve

caseydavenport avatar Oct 18 '22 18:10 caseydavenport

Looks like a couple build failures in the latest commit:

pkg/lifecycle/startup/startup.go:606:55: not enough arguments in call to autodetection.GetInterfaces
	have (nil, nil, int)
	want (func() ([]"net".Interface, error), []string, []string, int)

caseydavenport avatar Oct 18 '22 22:10 caseydavenport

Removing "merge-when-ready" label due to new commits

marvin-tigera avatar Oct 19 '22 08:10 marvin-tigera

@caseydavenport Sorry for missing this, PR should be fine now

neoaggelos avatar Oct 19 '22 14:10 neoaggelos

/sem-approve

caseydavenport avatar Oct 19 '22 17:10 caseydavenport

Thank you for merging! Apologies if this is documented somewhere, but is there a process to get this patch backported to older supported tracks as well?

neoaggelos avatar Oct 19 '22 21:10 neoaggelos

Its probably not documented. Process is to cherry-pick to the relevant release-v3.X branch. Once in there, the next patch release for that minor version will have the fix.

Obviously if you back port to more than just the last release, it also needs to be in every release after that point too, to avoid upgrade issues.

lwr20 avatar Oct 20 '22 08:10 lwr20

Thanks @lwr20, I've created two PRs for 3.23 and 3.24 respectively.

neoaggelos avatar Oct 20 '22 16:10 neoaggelos

This seems reasonable to me. It has a small chance of breaking someone relying on the current behavior, but I think this classifies properly as a bug and is unlikely to cause much harm.

@caseydavenport This broke us because the nodelocaldns interface that was skipped incorrectly without this change is now being picked as the node primary interface. https://github.com/projectcalico/calico/pull/6942 fixed this but there are several 3.23 and 3.24 releases in this state. Can we backport https://github.com/projectcalico/calico/pull/6942 to those release branches?.

skmatti avatar Dec 13 '22 23:12 skmatti

@skmatti sorry about that - yes let's backport that fix.

CC @mgleung for release context

caseydavenport avatar Dec 19 '22 17:12 caseydavenport