for-mac icon indicating copy to clipboard operation
for-mac copied to clipboard

CoreDNS issue: checkip.amazonaws.com doesn't resolve

Open NinoSkopac opened this issue 1 year ago • 11 comments

Description

I can't resolve https://checkip.amazonaws.com/ inside my pods, but I can resolve other domains (google.com etc).

The error I'm seeing in coredns pod is

[ERROR] plugin/errors: 2 checkip.amazonaws.com. A: dns: buffer size too small.

I'm guessing it's because checkip.amazonaws.com has a long DNS response?

% nslookup checkip.amazonaws.com               
Server:		2405:9800:a:1::10
Address:	2405:9800:a:1::10#53

Non-authoritative answer:
checkip.amazonaws.com	canonical name = checkip.check-ip.aws.a2z.com.
checkip.check-ip.aws.a2z.com	canonical name = checkip.eu-west-1.prod.check-ip.aws.a2z.com.
Name:	checkip.eu-west-1.prod.check-ip.aws.a2z.com
Address: 34.248.243.6
Name:	checkip.eu-west-1.prod.check-ip.aws.a2z.com
Address: 54.228.134.168
Name:	checkip.eu-west-1.prod.check-ip.aws.a2z.com
Address: 52.18.177.193
Name:	checkip.eu-west-1.prod.check-ip.aws.a2z.com
Address: 52.19.72.113
Name:	checkip.eu-west-1.prod.check-ip.aws.a2z.com
Address: 52.215.107.174
Name:	checkip.eu-west-1.prod.check-ip.aws.a2z.com
Address: 52.16.170.255
Name:	checkip.eu-west-1.prod.check-ip.aws.a2z.com
Address: 34.248.85.86
Name:	checkip.eu-west-1.prod.check-ip.aws.a2z.com
Address: 52.17.99.84

My CoreDNS config

% kubectl get configmap coredns -n kube-system -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        bufsize 4096
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap

I tried increasing its bufer size, adding bufsize 4096 inside:

  1. the main :53 block (like this) - no effect
  2. forward block - plugin/forward: /etc/coredns/Corefile:14 - Error during parsing: unknown property 'bufsize'

I'm running k8s 1.27 with CoreDNS v1.10.1 linux/arm64 started with Docker Desktop for Mac.

Actually Docker-Desktop says I'm using Kubernetes v1.28.2 whereas kubectl version says:

% kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.2

Works fine/resolves on EKS with the same k8s and CoreDNS versions built for EKS. The EKS' CoreDNS configmap doesn't specify bufsize.

Reproduce

Tail CoreDNS logs:

kubectl logs --follow coredns-5d78c9869d-8ljr8 -n kube-system
kubectl run curl-test --image=curlimages/curl --restart=Never -- sleep 3600
kubectl exec -it curl-test -- curl https://checkip.amazonaws.com
# curl: (6) Could not resolve host: checkip.amazonaws.com
# command terminated with exit code 6

# you'll see `[ERROR] plugin/errors: 2 checkip.amazonaws.com. A: dns: buffer size too small` in CoreDNS logs.

Expected behavior

Should resolve.

docker version

Client:
 Cloud integration: v1.0.35+desktop.5
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:28:49 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.25.2 (129061)
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:31:36 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    24.0.6
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2-desktop.5
    Path:     /Users/onin/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.23.0-desktop.1
    Path:     /Users/onin/.docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/onin/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.20
    Path:     /Users/onin/.docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.9
    Path:     /Users/onin/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/onin/.docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /Users/onin/.docker/cli-plugins/docker-scan
  scout: Docker Scout (Docker Inc.)
    Version:  v1.0.9
    Path:     /Users/onin/.docker/cli-plugins/docker-scout

Server:
 Containers: 44
  Running: 22
  Paused: 0
  Stopped: 22
 Images: 56
 Server Version: 24.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 8165feabfdfe38c65b599c4993d227328c231fca
 runc version: v1.1.8-0-g82f18fe
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.4.16-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 8
 Total Memory: 7.661GiB
 Name: linuxkit-e2912c52f35f
 ID: 118d9442-576c-4be1-bba9-751691395f3a
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

Diagnostics ID

2DCD1A30-24AE-4554-97A8-78D78F6D713E/20231126162837

Additional Info

No response

NinoSkopac avatar Nov 26 '23 16:11 NinoSkopac

I am having a similar problem.

From my pod I can ping google or s3.amazonaws.com but I can't ping a directory like dev-p2p-inputs.s3.amazonaws.com when in previous versions I could.

docker version

Client:
 Cloud integration: v1.0.35+desktop.5
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:28:49 2023
 OS/Arch:           darwin/amd64
 Context:           desktop-linux

Server: Docker Desktop 4.25.2 (129061)
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:32:16 2023
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

soniaCejas avatar Nov 30 '23 14:11 soniaCejas

I'm experiencing exactly the same problem. From the logs from coredns pod, I'm seeing

[ERROR] plugin/errors: 2 checkip.amazonaws.com. A: dns: buffer size too small
[ERROR] plugin/errors: 2 jataware-world-modelers.s3.amazonaws.com. A: dns: buffer size too small

Docker version

Client:
 Cloud integration: v1.0.35+desktop.5
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:28:49 2023
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.25.2 (129061)
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:31:36 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
kubectl version
Client Version: v1.28.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.2

jryu01 avatar Dec 06 '23 16:12 jryu01

I was experiencing the same issue, but I was able to come up with a workaround. Hopefully it helps someone else. I changed forward configuration to use google dns rather than resolv.conf and that solved the issue:

In configmap kube-system/coredns, I changed this:

      ...
      forward . /etc/resolv.conf {
          max_concurrent 1000
      }
      ...

to

      ...
      forward . 8.8.8.8
      ...

jpelkonen avatar Dec 07 '23 17:12 jpelkonen

For me this seems to be related to your upstream DNS provider.

I was about to resolve by forcing DNS over TCP (which points to the issue being DNS over UDP packets > 512 bytes) in configmap kube-system/coredns:

      ...
      forward . /etc/resolv.conf {
          force_tcp
          max_concurrent 1000
      }
      ...

mixja avatar Jan 08 '24 07:01 mixja

Interesting

NinoSkopac avatar Jan 08 '24 09:01 NinoSkopac

This is definitely a recent regression hit me as soon as I updated

beergoat avatar Jan 10 '24 17:01 beergoat

This looks similar to #7110

MihaelaStoica avatar Jan 19 '24 10:01 MihaelaStoica

Rolled back to 4.18.0 and issue is not present

beergoat avatar Jan 24 '24 20:01 beergoat

I have the same issue with login.windows.net, perhaps to do with the response length as well?

Non-authoritative answer:
Name:    www.tm.a.prd.aadg.akadns.net
Addresses:  2603:1037:1:128::9
          2603:1036:3000:138::5
          2603:1037:1:130::3
          2603:1037:1:130::6
          2603:1037:1:128::8
          2603:1037:1:130::4
          2603:1036:3000:138::4
          2603:1036:3000:138::6
          20.190.190.132
          20.190.190.129
          40.126.62.132
          20.190.190.196
          40.126.62.131
          20.190.190.130
          20.190.190.195
          20.190.190.194
Aliases:  login.windows.net
          a.privatelink.msidentity.com
          prda.aadg.msidentity.com

ryanwinter avatar Jan 30 '24 23:01 ryanwinter

Same issue on Docker Desktop 4.27.2 (137060).

kubectl version: Client Version: v1.29.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.1

docker info: Client Version: 25.0.3 Server Version: 25.0.3

logs of coredns: 2024-02-29 10:12:31 [ERROR] plugin/errors: 2 <MY_BUCKET_NAME>.s3.eu-central-1.amazonaws.com. A: dns: buffer size too small 2024-02-29 10:12:32 [ERROR] plugin/errors: 2 <MY_BUCKET_NAME>.s3.eu-central-1.amazonaws.com. A: dns: buffer size too small

Name of this host is really big, because AWS s3 have alot of endpoints for the s3 resources. So it's really not fit into the buffer.

atomicloopzilla avatar Feb 29 '24 09:02 atomicloopzilla

I also have the same issue on Docker Desktop 4.28.0(139021).

The force_tcp workaround mentioned above works for me.

mbertelsen-kryptowire avatar Mar 28 '24 13:03 mbertelsen-kryptowire

I ran into the issue in Docker Desktop for Mac 4.30.0 with a google api request. The force_tcp workaround works for me.

brandondoran avatar May 16 '24 22:05 brandondoran