node-feature-discovery
node-feature-discovery copied to clipboard
NFD worker fails to communicate with NFD master after worker node rejoin to the cluster
What happened:
After a node(as a NFD worker) was deleted through "kubectl" command and rejoin to the cluster, the NFD worker failed to communicate with NFD master. And there are some error logs in nfd-worker pod.
I0526 15:13:24.461238 1 component.go:36] [core]parsed scheme: ""
I0526 15:13:24.461249 1 component.go:36] [core]scheme "" not registered, fallback to default scheme
I0526 15:13:24.461364 1 component.go:36] [core]ccResolverWrapper: sending update to cc: {[{nfd-node-feature-discovery-master:8080
What you expected to happen:
NFD worker should communicate with NFD master normally after worker node rejoin to the cluster
How to reproduce it (as minimally and precisely as possible):
After NFD service deploy through helm successfully. Delete one worker node and after the NFD worker pod was deleted, rejoin the node to the cluster
Anything else we need to know?:
The NFD worker gateway address left after node delete and after rejoin the node, there will be two gateway addresses
Environment:
- Kubernetes version (use
kubectl version): client: v1.22.0; server:v1.17.8+vmware.1 - Cloud provider or hardware configuration: Tanzu
- OS (e.g:
cat /etc/os-release): Debian GNU/Linux 10 (buster) - Kernel (e.g.
uname -a):5.4.115 - Install tools: helm
Looks unlikely that it's anything NFD-related. I suspect your pod network is not working correctly. Check dns and cni on the node
I get this when running Minikube from time to time, my work around is
kubectl -n kube-system rollout restart deployment coredns
it always fix the network issue, and NFD workers communicate back with the master :)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.