dns Interface "nodelocaldns" is not up

Context:

There is a new addon for EKS clusters: https://aws.amazon.com/about-aws/whats-new/2024/12/node-health-monitoring-auto-repair-amazon-eks/

One of its tasks is to check if there is any networking interface that has DOWN state:

{"level":"info","ts":"2024-12-23T10:04:50Z","msg":"handling export request","source":"networking","condition":{"Reason":"InterfaceNotUp","Message":"Interface \"nodelocaldns\" is not up","Severity":"Fatal","MinOccurrences":0}}

It is presented in AWS console:

I think the heuristic is correct: it is unusual situation that should be reported when networking interface is down.

I made the workaround with the patch for DaemonSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-local-dns
  namespace: kube-system
spec:
  template:
    spec:
      initContainers:
        - name: interface-up
          image: public.ecr.aws/docker/library/alpine:latest
          restartPolicy: Always
          command:
            - /bin/sh
            - -c
            - |
              while :; do
                while :; do
                  ip link set dev nodelocaldns up && break
                  sleep 1
                done
                sleep 30
              done
          resources:
            requests:
              cpu: 10m
              memory: 16Mi
            limits:
              cpu: 10m
              memory: 16Mi
          securityContext:
            capabilities:
              add:
                - NET_ADMIN

This is the sidecar that sets the interface up. It is not really UP but rather with state UNKNOWN however now it is not reported by eks-node-monitoring-agent anymore.

I think the proper way would be to call LinkSetUp in AddDummyDevice function.

Dec 23 '24 10:12 dex4er

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 23 '25 11:03 k8s-triage-robot

/remove-lifecycle stale

Mar 23 '25 21:03 dex4er

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 24 '25 21:06 k8s-triage-robot

/remove-lifecycle stale

Some day maybe I'll create a proper PR...

Jun 25 '25 06:06 dex4er

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 23 '25 06:09 k8s-triage-robot

/remove-lifecycle stale

I swear I will fix it myself.

Sep 23 '25 08:09 dex4er