kind icon indicating copy to clipboard operation
kind copied to clipboard

Delay in network setup for Pod to Service communication

Open simingweng opened this issue 3 years ago • 18 comments

What happened: I have a client Pod wants to communicate with an HTTP Server Pod (via a Service in front of a Deployment). The HTTP Server deployment is triggered programmatically, and the client Pod uses Kubernetes client API to watch the Available condition of the Deployment resource, when the condition is met, it issues an HTTP request against the Service DNS name.

The request fails with a connection refused error. I have to wait for quite a while, and try the request again, eventually it will work.

I even adjusted the client side waiting logic to inspect Endpoints and EndpointSlice resource to make sure the HTTP Server Pod IP address is populated in Endpoints and EndpointSlice before it initiate the HTTP request, but none of them works.

What you expected to happen:

After the Endpoints or EndpointSlice is correctly populated, the Pod -> Service -> Pod communication should be successful, because this is the behavior consistent with non-container based Kubernetes distribution.

How to reproduce it (as minimally and precisely as possible):

  1. create KinD cluster kind create cluster
  2. deploy a Pod with kubectl capability by applying the following yaml to the created cluster
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kubectl
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubectl
spec:
  selector:
    matchLabels:
      app: kubectl
  template:
    metadata:
      labels:
        app: kubectl
    spec:
      containers:
        - name: kubectl
          image: bitnami/kubectl
          command:
            - sleep
            - infinity
      serviceAccountName: kubectl
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kubectl
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: kubectl
    namespace: default
  1. open a shell to the kubectl Pod kubectl exec -it kubectl-78878b75b5-wl5vm -- bash
  2. inside the shell of kubectl Pod, execute the following command kubectl apply -f https://raw.githubusercontent.com/istio/istio/master/samples/httpbin/httpbin.yaml;kubectl wait --for=condition=Available deployment/httpbin;until time curl http://httpbin:8000/get;do echo "try again";done

this command simulates the scenario where a client is waiting on the deployment to be ready, before it sends http request to the server Pod. Here I keep trying the curl command until it succeeds, so that I can see how long it takes for the networking to be set up completely.

Here is the output of my testing:

serviceaccount/httpbin created
service/httpbin created
deployment.apps/httpbin created
deployment.apps/httpbin condition met
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m7.099s
user	0m0.005s
sys	0m0.006s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m0.010s
user	0m0.003s
sys	0m0.006s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m0.009s
user	0m0.003s
sys	0m0.006s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m0.009s
user	0m0.002s
sys	0m0.007s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m0.009s
user	0m0.003s
sys	0m0.006s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m0.010s
user	0m0.004s
sys	0m0.006s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m1.041s
user	0m0.004s
sys	0m0.007s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m1.023s
user	0m0.007s
sys	0m0.006s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m1.025s
user	0m0.006s
sys	0m0.006s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	0m1.023s
user	0m0.004s
sys	0m0.006s
try again
curl: (7) Failed to connect to httpbin port 8000: Connection refused

real	1m17.621s
user	0m0.011s
sys	0m0.015s
try again
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Host": "httpbin:8000", 
    "User-Agent": "curl/7.64.0"
  }, 
  "origin": "10.244.0.2", 
  "url": "http://httpbin:8000/get"
}

real	0m0.017s
user	0m0.007s
sys	0m0.006s

If looking at the last failure right before the final success, the curl command hang there for more than a minute.

Anything else we need to know?:

This issue surfaces both in KinD and also in Minikube with Docker driver. If I switch Minikube to hyperkit or virtualbox driver, the issue disappears. I'm also not seeing this issue on EKS or AKS, but GKE does exhibit the same issue.

Environment:

  • kind version:
kind v0.10.0 go1.15.7 darwin/amd64
  • Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"clean", BuildDate:"2021-04-08T21:16:14Z", GoVersion:"go1.16.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-21T01:11:42Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version:
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  compose: Docker Compose (Docker Inc., 2.0.0-beta.1)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 6
  Running: 6
  Paused: 0
  Stopped: 0
 Images: 266
 Server Version: 20.10.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.25-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 7.775GiB
 Name: docker-desktop
 ID: 2JZY:DWSA:55MA:ZJ43:XRID:XUWW:NCOI:HUDK:PHUA:NZUB:62XR:6STK
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 90
  Goroutines: 81
  System Time: 2021-05-25T23:17:10.5021701Z
  EventsListeners: 3
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
  • OS:
ProductName:	macOS
ProductVersion:	11.3.1
BuildVersion:	20E241

simingweng avatar May 25 '21 23:05 simingweng

Please try upgrading to v0.11 and one of the images released with it. We made some improvements to the networking stack since v0.10

BenTheElder avatar May 25 '21 23:05 BenTheElder

Please try upgrading to v0.11 and one of the images released with it. We made some improvements to the networking stack since v0.10

just tried with kind v0.11.0 go1.16.3 darwin/amd64, still the same behavior, the last failed curl execution took more than a minute to error out, and then the subsequent iteration succeeded eventually.

simingweng avatar May 25 '21 23:05 simingweng

Also, my colleague didn't see this issue when his Docker Desktop app was a couple of versions older than mine; After he upgraded his to the same as mine, he started seeing the same issue. But we lost the track of which exact version of Docker Desktop he had previously.

simingweng avatar May 25 '21 23:05 simingweng

Without digging into a reproducer (doing other things atm), I would guess it's most likely the docker embedded DNS acting up.

That would explain affecting both kind and the derivative docker mode in minikube, and only showing up after you upgraded docker.

If it is the case though, there's not much we can do about it here.

BenTheElder avatar May 26 '21 01:05 BenTheElder

I dig a bit deeper by taking snapshot of iptables of the single node.

The first snapshot is taken right after the httpbin deployment became Available, which means it has Ready Pod.

# Generated by iptables-save v1.8.5 on Wed May 26 01:15:18 2021
*mangle
:PREROUTING ACCEPT [1186166:435334751]
:INPUT ACCEPT [1185936:435297997]
:FORWARD ACCEPT [230:36754]
:OUTPUT ACCEPT [1021400:172352482]
:POSTROUTING ACCEPT [1021610:172388056]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-PROXY-CANARY - [0:0]
COMMIT
# Completed on Wed May 26 01:15:18 2021
# Generated by iptables-save v1.8.5 on Wed May 26 01:15:18 2021
*filter
:INPUT ACCEPT [51772:8549330]
:FORWARD ACCEPT [9:744]
:OUTPUT ACCEPT [51821:9422272]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Wed May 26 01:15:18 2021
# Generated by iptables-save v1.8.5 on Wed May 26 01:15:18 2021
*nat
:PREROUTING ACCEPT [3:180]
:INPUT ACCEPT [6:404]
:OUTPUT ACCEPT [561:33704]
:POSTROUTING ACCEPT [567:34224]
:DOCKER_OUTPUT - [0:0]
:DOCKER_POSTROUTING - [0:0]
:KIND-MASQ-AGENT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-SEP-6E7XQMQ4RAYOWTTM - [0:0]
:KUBE-SEP-FVQSBIWR5JTECIVC - [0:0]
:KUBE-SEP-LASJGFFJP3UOS6RQ - [0:0]
:KUBE-SEP-LPGSDLJ3FDW46N4W - [0:0]
:KUBE-SEP-QKX4QX54UKWK6JIY - [0:0]
:KUBE-SEP-ZP3FB6NMPNCO4VBJ - [0:0]
:KUBE-SEP-ZXMNUKOKXUTL2MK2 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -d 192.168.65.2/32 -j DOCKER_OUTPUT
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -d 192.168.65.2/32 -j DOCKER_OUTPUT
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -d 192.168.65.2/32 -j DOCKER_POSTROUTING
-A POSTROUTING -m addrtype ! --dst-type LOCAL -m comment --comment "kind-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom KIND-MASQ-AGENT chain" -j KIND-MASQ-AGENT
-A DOCKER_OUTPUT -d 192.168.65.2/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:38695
-A DOCKER_OUTPUT -d 192.168.65.2/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:50472
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 38695 -j SNAT --to-source 192.168.65.2:53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 50472 -j SNAT --to-source 192.168.65.2:53
-A KIND-MASQ-AGENT -d 10.244.0.0/16 -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -j RETURN
-A KIND-MASQ-AGENT -m comment --comment "kind-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
-A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.3:53
-A KUBE-SEP-FVQSBIWR5JTECIVC -s 10.244.0.5/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-FVQSBIWR5JTECIVC -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.5:9153
-A KUBE-SEP-LASJGFFJP3UOS6RQ -s 10.244.0.5/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-LASJGFFJP3UOS6RQ -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.5:53
-A KUBE-SEP-LPGSDLJ3FDW46N4W -s 10.244.0.5/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-LPGSDLJ3FDW46N4W -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.5:53
-A KUBE-SEP-QKX4QX54UKWK6JIY -s 172.18.0.3/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-QKX4QX54UKWK6JIY -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination 172.18.0.3:6443
-A KUBE-SEP-ZP3FB6NMPNCO4VBJ -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZP3FB6NMPNCO4VBJ -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.3:9153
-A KUBE-SEP-ZXMNUKOKXUTL2MK2 -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZXMNUKOKXUTL2MK2 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.3:53
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ZXMNUKOKXUTL2MK2
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-LASJGFFJP3UOS6RQ
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ZP3FB6NMPNCO4VBJ
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-SEP-FVQSBIWR5JTECIVC
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-QKX4QX54UKWK6JIY
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-6E7XQMQ4RAYOWTTM
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-LPGSDLJ3FDW46N4W
COMMIT
# Completed on Wed May 26 01:15:19 2021

The second snapshot is taken a couple of minutes later.

# Generated by iptables-save v1.8.5 on Wed May 26 01:18:24 2021
*mangle
:PREROUTING ACCEPT [1217024:440238923]
:INPUT ACCEPT [1216794:440202169]
:FORWARD ACCEPT [230:36754]
:OUTPUT ACCEPT [1052216:177448310]
:POSTROUTING ACCEPT [1052426:177483884]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-PROXY-CANARY - [0:0]
COMMIT
# Completed on Wed May 26 01:18:24 2021
# Generated by iptables-save v1.8.5 on Wed May 26 01:18:24 2021
*filter
:INPUT ACCEPT [29625:4711246]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [29584:4894699]
:KUBE-EXTERNAL-SERVICES - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-FORWARD - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-SERVICES - [0:0]
-A INPUT -m comment --comment "kubernetes health check service ports" -j KUBE-NODEPORTS
-A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A INPUT -j KUBE-FIREWALL
-A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A FORWARD -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
-A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FIREWALL ! -s 127.0.0.0/8 -d 127.0.0.0/8 -m comment --comment "block incoming localnet connections" -m conntrack ! --ctstate RELATED,ESTABLISHED,DNAT -j DROP
-A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT
# Completed on Wed May 26 01:18:25 2021
# Generated by iptables-save v1.8.5 on Wed May 26 01:18:25 2021
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [325:19500]
:POSTROUTING ACCEPT [325:19500]
:DOCKER_OUTPUT - [0:0]
:DOCKER_POSTROUTING - [0:0]
:KIND-MASQ-AGENT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-PROXY-CANARY - [0:0]
:KUBE-SEP-6E7XQMQ4RAYOWTTM - [0:0]
:KUBE-SEP-FVQSBIWR5JTECIVC - [0:0]
:KUBE-SEP-LASJGFFJP3UOS6RQ - [0:0]
:KUBE-SEP-LPGSDLJ3FDW46N4W - [0:0]
:KUBE-SEP-QKX4QX54UKWK6JIY - [0:0]
:KUBE-SEP-XT3VQGQ2EKT4GEUU - [0:0]
:KUBE-SEP-ZP3FB6NMPNCO4VBJ - [0:0]
:KUBE-SEP-ZXMNUKOKXUTL2MK2 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-FREKB6WNWYJLKTHC - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -d 192.168.65.2/32 -j DOCKER_OUTPUT
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -d 192.168.65.2/32 -j DOCKER_OUTPUT
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -d 192.168.65.2/32 -j DOCKER_POSTROUTING
-A POSTROUTING -m addrtype ! --dst-type LOCAL -m comment --comment "kind-masq-agent: ensure nat POSTROUTING directs all non-LOCAL destination traffic to our custom KIND-MASQ-AGENT chain" -j KIND-MASQ-AGENT
-A DOCKER_OUTPUT -d 192.168.65.2/32 -p tcp -m tcp --dport 53 -j DNAT --to-destination 127.0.0.11:38695
-A DOCKER_OUTPUT -d 192.168.65.2/32 -p udp -m udp --dport 53 -j DNAT --to-destination 127.0.0.11:50472
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p tcp -m tcp --sport 38695 -j SNAT --to-source 192.168.65.2:53
-A DOCKER_POSTROUTING -s 127.0.0.11/32 -p udp -m udp --sport 50472 -j SNAT --to-source 192.168.65.2:53
-A KIND-MASQ-AGENT -d 10.244.0.0/16 -m comment --comment "kind-masq-agent: local traffic is not subject to MASQUERADE" -j RETURN
-A KIND-MASQ-AGENT -m comment --comment "kind-masq-agent: outbound traffic is subject to MASQUERADE (must be last in chain)" -j MASQUERADE
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m mark ! --mark 0x4000/0x4000 -j RETURN
-A KUBE-POSTROUTING -j MARK --set-xmark 0x4000/0x0
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -j MASQUERADE --random-fully
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-6E7XQMQ4RAYOWTTM -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.3:53
-A KUBE-SEP-FVQSBIWR5JTECIVC -s 10.244.0.5/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-FVQSBIWR5JTECIVC -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.5:9153
-A KUBE-SEP-LASJGFFJP3UOS6RQ -s 10.244.0.5/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-LASJGFFJP3UOS6RQ -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.5:53
-A KUBE-SEP-LPGSDLJ3FDW46N4W -s 10.244.0.5/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-LPGSDLJ3FDW46N4W -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.244.0.5:53
-A KUBE-SEP-QKX4QX54UKWK6JIY -s 172.18.0.3/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-QKX4QX54UKWK6JIY -p tcp -m comment --comment "default/kubernetes:https" -m tcp -j DNAT --to-destination 172.18.0.3:6443
-A KUBE-SEP-XT3VQGQ2EKT4GEUU -s 10.244.0.7/32 -m comment --comment "default/httpbin:http" -j KUBE-MARK-MASQ
-A KUBE-SEP-XT3VQGQ2EKT4GEUU -p tcp -m comment --comment "default/httpbin:http" -m tcp -j DNAT --to-destination 10.244.0.7:80
-A KUBE-SEP-ZP3FB6NMPNCO4VBJ -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZP3FB6NMPNCO4VBJ -p tcp -m comment --comment "kube-system/kube-dns:metrics" -m tcp -j DNAT --to-destination 10.244.0.3:9153
-A KUBE-SEP-ZXMNUKOKXUTL2MK2 -s 10.244.0.3/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-ZXMNUKOKXUTL2MK2 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.244.0.3:53
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.136.177/32 -p tcp -m comment --comment "default/httpbin:http cluster IP" -m tcp --dport 8000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.96.136.177/32 -p tcp -m comment --comment "default/httpbin:http cluster IP" -m tcp --dport 8000 -j KUBE-SVC-FREKB6WNWYJLKTHC
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ZXMNUKOKXUTL2MK2
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-LASJGFFJP3UOS6RQ
-A KUBE-SVC-FREKB6WNWYJLKTHC -m comment --comment "default/httpbin:http" -j KUBE-SEP-XT3VQGQ2EKT4GEUU
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ZP3FB6NMPNCO4VBJ
-A KUBE-SVC-JD5MR3NA4I4DYORP -m comment --comment "kube-system/kube-dns:metrics" -j KUBE-SEP-FVQSBIWR5JTECIVC
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-QKX4QX54UKWK6JIY
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-6E7XQMQ4RAYOWTTM
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-LPGSDLJ3FDW46N4W
COMMIT
# Completed on Wed May 26 01:18:26 2021

By looking up for keyword httpbin in these two snapshot, we can find kube-proxy was not updating the iptables instantly. This delay is causing the curl connection failure, isn't it?

simingweng avatar May 26 '21 02:05 simingweng

By doing a better job in clocking the iptable-save dump, it proves I was wrong, kube-proxy managed to update the iptables within a second.

But even with the iptables properly updated, the curl command still kept failing for a minute or so.

@BenTheElder if I add the -v to my curl command, I can see the httpbin service DNS name got resolved to its ClusterIP correctly. Does it rule out the docker DNS problem as you suggested?

simingweng avatar May 26 '21 03:05 simingweng

you have Prometheus metrics in kube-proxy that measure exactly that, the time since the endpoint is created to the time when the iptables rules are programmed. https://github.com/kubernetes/kubernetes/blob/b496238dd65d86c65183ac7ffa128c5de46705b4/pkg/proxy/metrics/metrics.go#L52

An example on how to install Prometheus in kind here https://github.com/aojea/kind-networking-plugins/tree/main/monitoring

That will give you better insight

aojea avatar May 26 '21 05:05 aojea

@BenTheElder if I add the -v to my curl command, I can see the httpbin service DNS name got resolved to its ClusterIP correctly. Does it rule out the docker DNS problem as you suggested?

Sounds like it.

But even with the iptables properly updated, the curl command still kept failing for a minute or so.

Is the pod ready and listening?

You may just have less resources available / slow startup in your docker setup. Docker desktop doesn't allocate much OOTB.

Even if it were iptables programming times, that's in kube-proxy upstream, kind isn't really responsible for that ...

BenTheElder avatar May 26 '21 19:05 BenTheElder

@aojea Thanks for the pointer. I patched the kube-proxy configmap and deployed Prometheus, I could see 3 time series related to metrics network_programming_duration_seconds.

Screen Shot 2021-05-26 at 3 39 11 PM

I'm not sure which exact one to look at, but I assume we're talking about network_programming_duration_seconds_bucket, the interesting part of the table view is:

Screen Shot 2021-05-26 at 3 51 46 PM

If I read the table correctly, does it mean the network_programming_duration is mostly in the range of 10 ~ 20 seconds?

simingweng avatar May 26 '21 19:05 simingweng

@BenTheElder if I add the -v to my curl command, I can see the httpbin service DNS name got resolved to its ClusterIP correctly. Does it rule out the docker DNS problem as you suggested?

Sounds like it.

But even with the iptables properly updated, the curl command still kept failing for a minute or so.

Is the pod ready and listening?

the Pod is definitely in Ready status and listening, otherwise, the Deployment couldn't be Available; and in the real product testing setup, I was programmatically waiting for the Endpoints resource to be populated with the Pod IP address before initiate the communication via the Service.

You may just have less resources available / slow startup in your docker setup. Docker desktop doesn't allocate much OOTB.

I assigned 6CPU, 8G memory, and 184GB disk to my Docker Desktop. If you're talking about slow startup of the httpbin Pod, I don't believe so, as I mentioned above, I even tried programmatically waiting for the Endpoints resource to include the Pod IP address, which wouldn't happen unless the Pod is completely Ready.

If there's slowness in kube-proxy, I guess @aojea 's advice may provide more insight. I've pasted the metrics values above, the question is "does the metrics explains my observation?"

Even if it were iptables programming times, that's in kube-proxy upstream, kind isn't really responsible for that ...

simingweng avatar May 26 '21 20:05 simingweng

upps, I though that the fix for those metrics was in 1.21 but it is not, it is fixed in 1.22 that has not been released, however, it seems you are using 1.20. Check sync_proxy_rules_duration_seconds , you can do a graph and see the trend, this way we can rule out that the problem is in iptables programming

aojea avatar May 26 '21 20:05 aojea

I'm actually on v1.21.1, here's what you are looking for I guess. It is still a histogram metric, so we only have bucket/count/sum.

Screen Shot 2021-05-26 at 6 58 55 PM

simingweng avatar May 26 '21 23:05 simingweng

we want to know why the delay, there are multiples factors here, and we want to discard kube-proxy adding the rules is the problem, so you have to correlate the metrics with your tests, there is no baseline, every deployment is different.

If we know that the delay is caused because kube-proxy takes 6 secs to flush the rules, we found the problem.

In order to do that, you have to create a graph to check the time it took to create the rules when you are doing the test, this time is also recorded on kube-proxy logs if runs with verbose level 4.

aojea avatar May 27 '21 07:05 aojea

I think this is probably https://github.com/kubernetes-sigs/kind/issues/2208#issuecomment-857064037 which was a bug in docker desktop that should be fixed now, according to the user there.

BenTheElder avatar Jun 11 '21 16:06 BenTheElder

@simingweng can you try with up to date docker desktop as referenced in https://github.com/kubernetes-sigs/kind/issues/2280#issuecomment-859683436?

I'm still not seeing that kind has done anything, this seems most likely to be a problem with docker desktop or possibly kube-proxy.

BenTheElder avatar Jun 29 '21 18:06 BenTheElder

Given https://github.com/kubernetes-sigs/kind/issues/2280#issuecomment-848341309 + inactivity since may I'm inclined to call this https://github.com/kubernetes-sigs/kind/issues/2208#issuecomment-857064037 for now.

BenTheElder avatar Jul 29 '21 22:07 BenTheElder

Hi, I think the issue needs to be reopen, here is my case:

I am still seeing the issue on latest (as of Feb 9, 2022) Docker for macOS 4.4.2 and Kind 0.11.1 on M1 MacBook Pro w/ macOS 12.0.1.

I can reproduce it consistently with our integration test suite that redeploys DBs and microservice applications on each test.

Tests often fail with networking errors where app Deployments try to reach DB Deployments via Services for first ~10-15 seconds after successful rollout of DB Deployments.

Ie we have:

apiVersion: v1
kind: Service
metadata:
  name: postgresql
  namespace: postgresql
  labels:
    app: postgresql
spec:
  ports:
    - port: 5432
      name: posgresql
  clusterIP: None
  selector:
    app: postgresql

DB Deployment of course has startupProbe and we wait for its rollout before deploying the app:

DB Deployment:
          startupProbe:
            exec:
              command:
                - pg_isready
            initialDelaySeconds: 2
            periodSeconds: 1
            failureThreshold: 30

Then we wait for DB rollout: kubectl rollout -n postgresql status deployment postgresql

Then app Deployment tries to connect to postgresql Service. What we see is that app Deployment will frequently crash w/ Network errors trying to reach the Service, so I had to add initContainer like this:

spec:
      # No need for this check in Production, it's a Kind/Docker bug — DNS Unavailable for few seconds after deploy
      # See https://github.com/kubernetes-sigs/kind/issues/2280
      initContainers:
        - name: verify-pgsql-reachable
          image: postgres:14-bullseye
          command: [
            '/bin/bash',
            '-c',
            'for i in {1..15}; do if pg_isready --host postgresql.postgresql.svc.cluster.local --timeout 1; then exit 0; else sleep 1; fi; done;'
          ]

And here is sample output from tests:

10:26:32.936 [ne#52] INFO - Init container logs
    postgresql.postgresql.svc.cluster.local:5432 - no response
    postgresql.postgresql.svc.cluster.local:5432 - no response
    postgresql.postgresql.svc.cluster.local:5432 - no response
    postgresql.postgresql.svc.cluster.local:5432 - no response
    postgresql.postgresql.svc.cluster.local:5432 - no response
    postgresql.postgresql.svc.cluster.local:5432 - no response
    postgresql.postgresql.svc.cluster.local:5432 - accepting connections

As you can see for significant time ~12seconds(!) 6x * (1s timeout + 1s delay) there is no networking between app Deployment and DB Service and the check is same pg_isready as it is in startupProbe of DB Deployment.

Thus, I think this issue needs to be reopen and investigated further, Kind is often used in integration tests and when two services can't talk to each other well that's an issue :/


What I also found is that using Service.spec.clusterIP: None (headless mode where Service DNS entry points directly to selected pods) results in longer networking unavailability than Service.spec.type: ClusterIP (when k8s actually routes traffic through a stable IP to actual Pods) with Kind and Docker on macOS.

So for now I'm switching Services to type: ClusterIP, but it's not a fix, just a workaround and not a stable one either because I still see unavailability and I'd actually like to use clusterIP: None:

("better" than clusterIP: None but still bad)

10:53:35.583 [ne#52] INFO  - Init container logs
    postgresql.postgresql.svc.cluster.local:5432 - no response
    postgresql.postgresql.svc.cluster.local:5432 - no response
    postgresql.postgresql.svc.cluster.local:5432 - accepting connections

artem-zinnatullin avatar Feb 09 '22 07:02 artem-zinnatullin

I no longer have docker desktop / kind on macOS (due to license change), so I can't reproduce to investigate. My co-maintainer @aojea never had access to this environment.

Based on prior comments:

This issue surfaces both in KinD and also in Minikube with Docker driver. If I switch Minikube to hyperkit or virtualbox driver, the issue disappears. I'm also not seeing this issue on EKS or AKS, but GKE does exhibit the same issue.

It's not clear that this is even kind specific though.

I can reopen this, but I can't work on it.

BenTheElder avatar Feb 09 '22 07:02 BenTheElder