calico
calico copied to clipboard
Match full interface names to exclude
Description
Fix bug where Calico first-found
IP autodetection method ignores valid interfaces.
DEFAULT_INTERFACES_TO_EXCLUDE
is a list of regular expressions for interface names to ignore when the IP autodetection method is set to first-found
. These are mostly virtual interfaces, which are safe to ignore.
A lurking issue is that regexp.MatchString()
returns true even for partial string matches. This means that any interface including these string parts will be matched and ignored. This is problematic for ppc64le, since a standard interface naming scheme is ibmveth{a,b...}
.
In my attempt to preserve the spirit of the original author, I've changed the regular expressions to match at the start of the interface name only. This PR targets calico-node, and fixed the issue for me (test image for ppc64le
is cdkbot/calico-node:v3.23.3-dev1-ppc64le
)
Other options:
- Add
$
at the end of all strings to make the intent clear. - Only change
veth.*
to^veth.*$
- Only change
veth.*
to^!(ibm)veth.*$
(not sure about this regex, basically matchveth.*
but notibmveth.*
).
Related issues/PRs
References https://github.com/canonical/microk8s/issues/3458
Todos
- [ ] Tests (it is not straightforward
- [ ] Documentation
- [ ] Release note
Release Note
Match full interface names in IP auto-detection default exclude list.
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one docs-*
label.
-
docs-pr-required
: This change requires a change to the documentation that has not been completed yet. -
docs-completed
: This change has all necessary documentation completed. -
docs-not-required
: This change has no user-facing impact and requires no docs.
Every PR needs one release-note-*
label.
-
release-note-required
: This PR has user-facing changes. Most PRs should have this label. -
release-note-not-required
: This PR has no user-facing changes.
Other optional labels:
-
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only. -
needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.
Hi, I would appreciate some feedback on this.
pinging @caseydavenport for feedback?
bump; @caseydavenport any comments?
Sorry for the delay - I missed the notifications.
This seems reasonable to me. It has a small chance of breaking someone relying on the current behavior, but I think this classifies properly as a bug and is unlikely to cause much harm.
/sem-approve
It would be great to see a unit test to go with this change that would fail with the old code but passes with this commit.
@caseydavenport Added some unit tests, I've only slightly refactored autodetection.GetInterfaces
to accomodate for this, since the hardcoded net.Interfaces()
call there made things quite difficult to test.
Hope the test case is sufficient, happy to add more if needed.
/sem-approve
Looks like a couple build failures in the latest commit:
pkg/lifecycle/startup/startup.go:606:55: not enough arguments in call to autodetection.GetInterfaces
have (nil, nil, int)
want (func() ([]"net".Interface, error), []string, []string, int)
Removing "merge-when-ready" label due to new commits
@caseydavenport Sorry for missing this, PR should be fine now
/sem-approve
Thank you for merging! Apologies if this is documented somewhere, but is there a process to get this patch backported to older supported tracks as well?
Its probably not documented. Process is to cherry-pick to the relevant release-v3.X
branch. Once in there, the next patch release for that minor version will have the fix.
Obviously if you back port to more than just the last release, it also needs to be in every release after that point too, to avoid upgrade issues.
Thanks @lwr20, I've created two PRs for 3.23 and 3.24 respectively.
This seems reasonable to me. It has a small chance of breaking someone relying on the current behavior, but I think this classifies properly as a bug and is unlikely to cause much harm.
@caseydavenport This broke us because the nodelocaldns interface that was skipped incorrectly without this change is now being picked as the node primary interface. https://github.com/projectcalico/calico/pull/6942 fixed this but there are several 3.23 and 3.24 releases in this state. Can we backport https://github.com/projectcalico/calico/pull/6942 to those release branches?.
@skmatti sorry about that - yes let's backport that fix.
CC @mgleung for release context