kops icon indicating copy to clipboard operation
kops copied to clipboard

Node-local-dns doesn't work with cilium CNI on kops 1.29.0

Open nikita-nazemtsev opened this issue 1 year ago • 16 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information. Client version: 1.29.0 (git-v1.29.0)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. 1.28.7

3. What cloud provider are you using? AWS

4. What commands did you run? What is the simplest way to reproduce this issue? Update Kops from 1.28.4 to 1.29.0, or create a new cluster using Kops 1.29.0 with Node Local DNS and Cilium CNI.

5. What happened after the commands executed? Pods on updated nodes cannot access node-local-dns pods image

6. What did you expect to happen? Pods can access node-local-dns pods.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2024-05-31T07:47:47Z"
  name: k8s.tmp-test.example
spec:
  additionalSans:
  - api-internal.k8s.tmp-test.example
  - api.internal.k8s.tmp-test.example
  api:
    loadBalancer:
      type: Internal
      useForInternalApi: true
  authentication: {}
  authorization:
    rbac: {}
  certManager:
    enabled: true
    managed: false
  cloudConfig:
    awsEBSCSIDriver:
      enabled: true
  cloudProvider: aws
  clusterAutoscaler:
    enabled: true
  configBase: s3://example/k8s.tmp-test.example
  containerd:
    registryMirrors:
      '*':
      - https://nexus-proxy.example.io
      docker.io:
      - https://nexus-proxy.example.io
      k8s.gcr.io:
      - https://nexus-proxy.example.io
      public.ecr.aws:
      - https://nexus-proxy.example.io
      quay.io:
      - https://nexus-proxy.example.io
      registry.example.io:
      - https://registry.example.io
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-1a
      name: a
    - instanceGroup: master-1b
      name: b
    - instanceGroup: master-1c
      name: c
    manager:
      backupRetentionDays: 90
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:2379
      - name: ETCD_METRICS
        value: basic
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
        value: 1d
      - name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
        value: 1d
      - name: ETCD_MAX_REQUEST_BYTES
        value: "1572864"
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-1a
      name: a
    - instanceGroup: master-1b
      name: b
    - instanceGroup: master-1c
      name: c
    manager:
      backupRetentionDays: 90
      env:
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
        value: 1d
      - name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
        value: 1d
      - name: ETCD_MAX_REQUEST_BYTES
        value: "1572864"
    memoryRequest: 100Mi
    name: events
  fileAssets:
  - content: |
      apiVersion: audit.k8s.io/v1
      kind: Policy
      rules:
      - level: RequestResponse
        userGroups:
        - "/devops"
        - "/developers"
        - "/teamleads"
        - "/k8s-full"
        - "/sre"
        - "/support"
        - "/qa"
        - "system:serviceaccounts"
    name: audit-policy-config
    path: /etc/kubernetes/audit/policy-config.yaml
    roles:
    - ControlPlane
  iam:
    legacy: false
  kubeAPIServer:
    auditLogMaxAge: 10
    auditLogMaxBackups: 1
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditPolicyFile: /etc/kubernetes/audit/policy-config.yaml
    oidcClientID: kubernetes
    oidcGroupsClaim: groups
    oidcIssuerURL: https://sso.example.io/auth/realms/example
    serviceAccountIssuer: https://api-internal.k8s.tmp-test.example
    serviceAccountJWKSURI: https://api-internal.k8s.tmp-test.example/openid/v1/jwks
  kubeDNS:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kops.k8s.io/instancegroup
              operator: In
              values:
              - infra-nodes
    nodeLocalDNS:
      cpuRequest: 25m
      enabled: true
      memoryRequest: 5Mi
    provider: CoreDNS
    tolerations:
    - effect: NoSchedule
      key: dedicated/infra
      operator: Exists
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    evictionHard: memory.available<7%,nodefs.available<3%,nodefs.inodesFree<5%,imagefs.available<10%,imagefs.inodesFree<5%
    evictionMaxPodGracePeriod: 30
    evictionSoft: memory.available<12%
    evictionSoftGracePeriod: memory.available=200s
  kubernetesApiAccess:
  - 10.170.0.0/16
  kubernetesVersion: 1.28.7
  masterPublicName: api.k8s.tmp-test.example
  metricsServer:
    enabled: false
    insecure: true
  networkCIDR: 10.170.0.0/16
  networkID: vpc-xxxx
  networking:
    cilium:
      enableNodePort: true
      enablePrometheusMetrics: true
      ipam: eni
  nodeProblemDetector:
    enabled: true
  nodeTerminationHandler:
    enableSQSTerminationDraining: false
    enabled: true
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 10.170.0.0/16
  subnets:
  - cidr: 10.170.140.0/24
    name: kops-k8s-1a
    type: Private
    zone: eu-central-1a
  - cidr: 10.170.142.0/24
    name: kops-k8s-eni-1a
    type: Private
    zone: eu-central-1a
  - cidr: 10.170.141.0/24
    name: kops-k8s-utility-1a
    type: Utility
    zone: eu-central-1a
  - cidr: 10.170.143.0/24
    name: kops-k8s-1b
    type: Private
    zone: eu-central-1b
  - cidr: 10.170.145.0/24
    name: kops-k8s-eni-1b
    type: Private
    zone: eu-central-1b
  - cidr: 10.170.144.0/24
    name: kops-k8s-utility-1b
    type: Utility
    zone: eu-central-1b
  - cidr: 10.170.146.0/24
    name: kops-k8s-1c
    type: Private
    zone: eu-central-1c
  - cidr: 10.170.148.0/24
    name: kops-k8s-eni-1c
    type: Private
    zone: eu-central-1c
  - cidr: 10.170.147.0/24
    name: kops-k8s-utility-1c
    type: Utility
    zone: eu-central-1c
  topology:
    dns:
      type: Private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-05-31T07:47:48Z"
  labels:
    kops.k8s.io/cluster: k8s.tmp-test.example
  name: graviton-nodes
spec:
  autoscale: false
  image: ami-0192de4261c8ff06a
  machineType: t4g.small
  maxSize: 1
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - t4g.small
    onDemandAboveBase: 0
    onDemandBase: 0
    spotAllocationStrategy: lowest-price
    spotInstancePools: 3
  nodeLabels:
    kops.k8s.io/instancegroup: graviton-nodes
  role: Node
  rootVolumeSize: 25
  subnets:
  - kops-k8s-1a
  - kops-k8s-eni-1a
  sysctlParameters:
  - net.netfilter.nf_conntrack_max = 1048576
  - net.core.netdev_max_backlog = 30000
  - net.core.rmem_max = 134217728
  - net.core.wmem_max = 134217728
  - net.ipv4.tcp_wmem = 4096 87380 67108864
  - net.ipv4.tcp_rmem = 4096 87380 67108864
  - net.ipv4.tcp_mem = 187143 249527 1874286
  - net.ipv4.tcp_max_syn_backlog = 8192
  - net.ipv4.ip_local_port_range = 10240 65535

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-05-31T07:47:48Z"
  labels:
    kops.k8s.io/cluster: k8s.tmp-test.example
  name: infra-nodes
spec:
  autoscale: false
  image: ami-035f7f826413ac489
  machineType: t3.small
  maxSize: 1
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - t3.small
    onDemandAboveBase: 0
    onDemandBase: 0
    spotAllocationStrategy: lowest-price
    spotInstancePools: 3
  nodeLabels:
    kops.k8s.io/instancegroup: infra-nodes
  role: Node
  rootVolumeSize: 25
  subnets:
  - kops-k8s-1a
  - kops-k8s-eni-1a
  sysctlParameters:
  - net.netfilter.nf_conntrack_max = 1048576
  - net.core.netdev_max_backlog = 30000
  - net.core.rmem_max = 134217728
  - net.core.wmem_max = 134217728
  - net.ipv4.tcp_wmem = 4096 12582912 16777216
  - net.ipv4.tcp_rmem = 4096 12582912 16777216
  - net.ipv4.tcp_mem = 187143 249527 1874286
  - net.ipv4.tcp_max_syn_backlog = 8192
  - net.ipv4.ip_local_port_range = 10240 65535

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-05-31T07:47:47Z"
  labels:
    kops.k8s.io/cluster: k8s.tmp-test.example
  name: master-1a
spec:
  image: ami-035f7f826413ac489
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-1a
  role: Master
  rootVolumeSize: 25
  subnets:
  - kops-k8s-1a
  - kops-k8s-eni-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-05-31T07:47:47Z"
  labels:
    kops.k8s.io/cluster: k8s.tmp-test.example
  name: master-1b
spec:
  image: ami-035f7f826413ac489
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-1b
  role: Master
  rootVolumeSize: 25
  subnets:
  - kops-k8s-1b
  - kops-k8s-eni-1b

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-05-31T07:47:48Z"
  labels:
    kops.k8s.io/cluster: k8s.tmp-test.example
  name: master-1c
spec:
  image: ami-035f7f826413ac489
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-1c
  role: Master
  rootVolumeSize: 25
  subnets:
  - kops-k8s-1c
  - kops-k8s-eni-1c

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-05-31T07:47:48Z"
  labels:
    kops.k8s.io/cluster: k8s.tmp-test.example
  name: nodes
spec:
  image: ami-035f7f826413ac489
  machineType: t3.small
  maxSize: 1
  minSize: 1
  mixedInstancesPolicy:
    instances:
    - t3.small
    onDemandAboveBase: 0
    onDemandBase: 0
    spotAllocationStrategy: lowest-price
    spotInstancePools: 3
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  role: Node
  rootVolumeSize: 25
  subnets:
  - kops-k8s-1a
  - kops-k8s-eni-1a
  sysctlParameters:
  - net.netfilter.nf_conntrack_max = 1048576
  - net.core.netdev_max_backlog = 30000
  - net.core.rmem_max = 134217728
  - net.core.wmem_max = 134217728
  - net.ipv4.tcp_wmem = 4096 87380 67108864
  - net.ipv4.tcp_rmem = 4096 87380 67108864
  - net.ipv4.tcp_mem = 187143 249527 1874286
  - net.ipv4.tcp_max_syn_backlog = 8192
  - net.ipv4.ip_local_port_range = 10240 65535

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

We found a workaround to fix this issue on a single node: We noticed that the nodelocaldns interface is in a down state on nodes. ( but the same we can observe on older kops versions where node-local-dns works fine) image But after executing ip link set dev nodelocaldns up Nodelocaldns interface: image In cilium agent logs on this node we can see: time="2024-05-31T09:23:08Z" level=info msg="Node addresses updated" device=nodelocaldns node-addresses="169.254.20.10 (nodelocaldns)" subsys=node-address

After these actions, all pods on this node can access node-local-dns without any problems.

nikita-nazemtsev avatar May 31 '24 09:05 nikita-nazemtsev

Hi, Have got the same issue while upgrading Kops from 1.28 to 1.29. This is quite a critical bug/regression in Kops 1.29 which blocks us from upgrading. Any known workarounds for this?

DmytroKozlovskyi avatar Jun 20 '24 12:06 DmytroKozlovskyi

/reopen I wasn't able to repro the issue but I did upgrade Cilium to the latest 1.15 patch version. If you're able to build kops from source, can you build the kops CLI from this branch, run kops update cluster --yes and see if the issue is fixed?

rifelpet avatar Jun 22 '24 16:06 rifelpet

@rifelpet: Reopened this issue.

In response to this:

/reopen I wasn't able to repro the issue but I did upgrade Cilium to the latest 1.15 patch version. If you're able to build kops from source, can you build the kops CLI from this branch, run kops update cluster --yes and see if the issue is fixed?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jun 22 '24 16:06 k8s-ci-robot

I recreated the cluster using kops from a branch, but it didn't solve the issue.

image

nikita-nazemtsev avatar Jun 23 '24 11:06 nikita-nazemtsev

I'm not sure whether it's connected or not but except of the nodelocaldns also not working I have an experimental IPv6-only cluster with cilium. I've tried upgrading it from kops v1.28 to v1.29 but the endpoints in cilium are unreachable on nodes.

I've looked what's changed in cilium setup and I found that hostNetwork: true was added to both cilium-operator and cilium DaemonSet. I suspect that it's somehow connected with both issues but I couldn't find exact issue.

kruczjak avatar Jun 24 '24 11:06 kruczjak

Is kube-dns service created in kube-system?

math3vz avatar Jul 15 '24 17:07 math3vz

Hi, yes, kube-dns service is created in kube-system. Also, there is a cilium doc on how to configure node-local-dns with cilium https://docs.cilium.io/en/v1.10/gettingstarted/local-redirect-policy/#node-local-dns-cache One interesting part is that node-local-dns must run as regular pod with hostNetwork: false, what not is the case in current Kops deployment. Also, CiliumLocalRedirectPolicy must be added. Took this from this issue: https://github.com/cilium/cilium/issues/16906

DmytroKozlovskyi avatar Jul 17 '24 12:07 DmytroKozlovskyi

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 15 '24 12:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 14 '24 12:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Dec 14 '24 13:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Dec 14 '24 13:12 k8s-ci-robot

Can we re-open this? It seems to be caused by https://github.com/cilium/cilium/pull/35098 To repro this scenario, just enable host BPF routing and tunnel on the cluster. The problem is introduced in cilium 1.16.5+ cc @rifelpet

rsafonseca avatar Mar 18 '25 19:03 rsafonseca

/reopen /remove-lifecycle rotten

This should be fixed by https://github.com/kubernetes/kops/pull/17266

rifelpet avatar Mar 20 '25 00:03 rifelpet

@rifelpet: Reopened this issue.

In response to this:

/reopen /remove-lifecycle rotten

This should be fixed by https://github.com/kubernetes/kops/pull/17266

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Mar 20 '25 00:03 k8s-ci-robot

Possibly related: https://docs.cilium.io/en/latest/network/kubernetes/local-redirect-policy/#node-local-dns-cache

I've run into this same issue with a new from scratch cluster using kops 1.31.1 and k8s 1.31.8.

The above article clearly states that node-local-dns isn't going to work when it's in hostNetwork mode. Special steps are required to make it work when Cilium is (as far as I understand) acting as a kube-proxy replacement.

Simply updating cilium from 1.16.5 to 1.16.9 did not solve it. I will post an update if/when I try the steps from the article linked above.

shapirus avatar May 01 '25 19:05 shapirus

If you can confirm the changes needed for cilium and NLD to work together, we can update their manifests here:

https://github.com/kubernetes/kops/blob/3aa4bc3c714a3a972c70677f79d75d07c3b6e9c2/upup/models/cloudup/resources/addons/networking.cilium.io/k8s-1.16-v1.15.yaml.template

https://github.com/kubernetes/kops/blob/3aa4bc3c714a3a972c70677f79d75d07c3b6e9c2/upup/models/cloudup/resources/addons/nodelocaldns.addons.k8s.io/k8s-1.12.yaml.template

rifelpet avatar May 07 '25 02:05 rifelpet

I've opened a fix PR for this issue @rifelpet, it should address the issue with cilium without changing the behaviour when other CNIs are used

rsafonseca avatar Jul 10 '25 18:07 rsafonseca

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 08 '25 19:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 07 '25 19:11 k8s-triage-robot

this is apparently working fine in 1.34.0.

shapirus avatar Nov 08 '25 00:11 shapirus

Is cilium network policy with node local dns cache works properly in kops 1.34.0?

czapajew avatar Nov 25 '25 07:11 czapajew