Interface "nodelocaldns" is not up
Context:
There is a new addon for EKS clusters: https://aws.amazon.com/about-aws/whats-new/2024/12/node-health-monitoring-auto-repair-amazon-eks/
One of its tasks is to check if there is any networking interface that has DOWN state:
{"level":"info","ts":"2024-12-23T10:04:50Z","msg":"handling export request","source":"networking","condition":{"Reason":"InterfaceNotUp","Message":"Interface \"nodelocaldns\" is not up","Severity":"Fatal","MinOccurrences":0}}
It is presented in AWS console:
I think the heuristic is correct: it is unusual situation that should be reported when networking interface is down.
I made the workaround with the patch for DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-local-dns
namespace: kube-system
spec:
template:
spec:
initContainers:
- name: interface-up
image: public.ecr.aws/docker/library/alpine:latest
restartPolicy: Always
command:
- /bin/sh
- -c
- |
while :; do
while :; do
ip link set dev nodelocaldns up && break
sleep 1
done
sleep 30
done
resources:
requests:
cpu: 10m
memory: 16Mi
limits:
cpu: 10m
memory: 16Mi
securityContext:
capabilities:
add:
- NET_ADMIN
This is the sidecar that sets the interface up. It is not really UP but rather with state UNKNOWN however now it is not reported by eks-node-monitoring-agent anymore.
I think the proper way would be to call LinkSetUp in AddDummyDevice function.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Some day maybe I'll create a proper PR...
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
I swear I will fix it myself.