weave icon indicating copy to clipboard operation
weave copied to clipboard

DNS lookup timeouts due to races in conntrack

Open dcowden opened this issue 6 years ago • 137 comments

What happened?

We are experiencing random 5 second DNS timeouts in our kubernetes cluster.

How to reproduce it?

It is reproducible by requesting just about any in-cluster service, and observing that periodically ( in our case, 1 out of 50 or 100 times), we get a 5 second delay. It always happens in DNS lookup.

Anything else we need to know?

We believe this is a result of a kernel level SNAT race condition that is described quite well here:

https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02

The problem happens with non-weave CNI implementations, and is (ironically) not even a weave issue really. However, its becomes a weave issue, because the solution is to set a flag on the masquerading rules that are created, which are not in anyone's control except for weave.

What we need is the ability to apply the NF_NAT_RANGE_PROTO_RANDOM_FULLY flag on the masquerading rules that weave sets up. IN the above post, Flannel was in use, and the fix was there instead.

We searched for this issue, and didnt see that anyone had asked for this. We're also unaware of any settings that allow setting this flag today-- if that's possible, please let us know.

dcowden avatar Apr 26 '18 21:04 dcowden

Whoa! Good job for finding that.

However:

The iptables tool doesn't support setting this flag

this might be an issue.

bboreham avatar Apr 26 '18 22:04 bboreham

@bboreham my kernel networking Fu is weak, so I'm not even able to suggest any work arounds. I'm hoping others here have stronger Fu... Challenge proposed!

naysayers frequently make scary, handwavey stability arguments against container stacks. Usually I laugh in the face of danger, but this appears to be the first ever case I've seen in which a little known kernel level gotcha actually does create issues for containers that would otherwise be unlikely to surface

dcowden avatar Apr 27 '18 00:04 dcowden

I just spent several hours trouble shooting this problem, ran into the same XING blog post and then this issue report which was opened while I was trouble shooting!

Anyway, I'm seeing the same issues reported in the XING blog. DNS 5 second delays and a lot of insert_failed counts from conntrack using weave 2.3.0.

cpu=0 found=8089 invalid=353025 ignore=1249480 insert=0 insert_failed=8042 drop=8042 early_drop=0 error=0 search_restart=591166

More details can be provided if needed.

btalbot avatar Apr 27 '18 00:04 btalbot

@btalbot one workaround you might try is to set this option in resolv.conf:

options single-request-reopen

It is a workaround that will basically make glibc retry the lookup, which will work most of the time.

Another bandaid that helps is to change ndots from 5 (the default) to 3, which will generate far fewer requests to your dns servers ,and lessen the frequency.

The problem is that it's kind of a pain to force changes into resolve.conf. it's done with kubelet --resolve-conf option, but then you have to create the whole file yourself which stinks.

dcowden avatar Apr 27 '18 00:04 dcowden

@bboreham it does appear that the patched iptables is available. Can weave use a patched iptables?

dcowden avatar Apr 27 '18 00:04 dcowden

The easiest thing is to use an iptables from a released Apline package. From there it gets progressively harder.

(Sorry for closing/reopening - finger slipped)

bboreham avatar Apr 27 '18 06:04 bboreham

BTW my top tip to reduce DNS requests is to put a dot at the end when you know the full address. Eg instead of example.com put example.com.. This means it will not go through the search path, reducing lookups by 5x in a typical Kubernetes install.

For an in-cluster address if you know the namespace you can construct the fqdn, e.g. servicename.namespacename.svc.cluster.local.

bboreham avatar Apr 27 '18 06:04 bboreham

@bboreham great tip, I didn't know that one! Thanks

dcowden avatar Apr 27 '18 10:04 dcowden

I did a little investigation on netfilter.org. it appears that the iptables patch that adds --random-fully is in iptables v 1.6.2, released on 2/22/2018.

alpine:latest packages v 1.6.1, however alpine:edge packages v 1.6.2

dcowden avatar Apr 27 '18 12:04 dcowden

For an in-cluster address if you know the namespace you can construct the fqdn, e.g. servicename.namespacename.svc.cluster.local.

This only works for some apps or resolvers. The bind tools honor that of course since that is a decades old syntax for bind's zone files. But any apps that try to fix an address or use a different resolver that trick doesn't work. Curl is a good example of that not working.

From inside an alpine container curl https://kubernetes/ will hit the api server of course but so does curl https://kubernetes./

btalbot avatar Apr 27 '18 19:04 btalbot

in our testing, we have found that only the options single-request-reopen change actually addresses this issue. Its a band-aid-- but dns lookups are fast, so we get aberrations of like 100ms, not 5 seconds,w hich is acceptable for us.

Now we're trying to figure out how to inject that into resolv.conf on all the pods. Anyone know how to do that?

dcowden avatar Apr 27 '18 19:04 dcowden

I found this hack in some other related github issues and it's working for me

apiVersion: v1
data:
  resolv.conf: |
    nameserver 1.2.3.4
    search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
    options ndots:3 single-request-reopen
kind: ConfigMap
metadata:
  name: resolvconf

Then in your affected pods and containers

        volumeMounts:
        - name: resolv-conf
          mountPath: /etc/resolv.conf
          subPath: resolv.conf
...

      volumes:
      - name: resolv-conf
        configMap:
          name: resolvconf
          items:
          - key: resolv.conf
            path: resolv.conf

btalbot avatar Apr 27 '18 19:04 btalbot

@btalbot thanks for posting that. That would definitely work in a pinch!

we use kops for our cluster, and the this seems promising. But i'm still learning how it works

dcowden avatar Apr 27 '18 20:04 dcowden

Experiencing the same issue here. 5s delays on every, single, DNS lookup, 100% of the time. Similarly, insert_failed does increase for each DNS query. The AAAA query, that happens a few cycles after the A query, gets dropped systematically (tcpdump: https://hastebin.com/banulayire.swift).

Mounting a resolv.conf by hand in every single pod of our infrastructure is untenable. https://github.com/kubernetes/kubernetes/pull/62764 attempts at adding the workaround as a default in Kubernetes, but the PR is unlikely to land. And even if it does, it won't be released for a good while.

Here is the flannel patch: https://gist.github.com/maxlaverse/1fb3bfdd2509e317194280f530158c98

Quentin-M avatar May 01 '18 03:05 Quentin-M

@quentin-m what k8s version are you using? I'm curious why it's 100% repeatable for some but intermittent for others.

Another method to inject resolve.conf change s would be a deployment initializer. I've been trying to avoid creating one, but it's beginning to seem inevitable that in an Enterprise environment you need a way to enforce various things on every launched workload in a central way.

I'm still investigating the use of kubelet --resolve-conf, but what I'm really worried about is that all this is just a bandaid..

The only actual fix is the iptables flag

dcowden avatar May 01 '18 10:05 dcowden

Has anyone tried installing and running iptables-1.6.2 from the alpine packages for edge on Alpine 3.7?

brb avatar May 01 '18 11:05 brb

@brb i was wondering the same thing. It would be nice to make progress and get a PR ready in anticipation of availability of 1.6.2. My go Fu is too week to take a shot at making the fix, but I'm guessing the fix goes somewhere around expose.go?

If it were possible to create a frankenversion that has this fix, we could test it out.

dcowden avatar May 01 '18 11:05 dcowden

Has anyone tried installing and running iptables-1.6.2 from the alpine packages for edge on Alpine 3.7?

Just installed it with apk add iptables --update-cache --repository http://dl-3.alpinelinux.org/alpine/edge/main/. However, I cannot guarantee that we don't miss anything with iptables from edge on 3.7.

the fix goes somewhere around expose.go

Yes, you are right.

If it were possible to create a frankenversion that has this fix, we could test it out.

I've just created the weave-kube image with the fix for amd64 arch only and kernel >= 3.13 (https://github.com/weaveworks/weave/tree/issues/3287-iptables-random-fully). To use it, please change the image name of weave-kube to "brb0/weave-kube:iptables-random-fully" in DaemonSet of Weave.

brb avatar May 01 '18 14:05 brb

@brb Score! that's awesome! we'll try this out asap!
We're currently using image weaveworks/weave-kube:2.2.0, via a kops cluster. Would this image interoperate ok with those?

dcowden avatar May 01 '18 15:05 dcowden

I can't think of anything which would prevent it from working.

Please let us know whether it works, thanks!

brb avatar May 01 '18 15:05 brb

@dcowden Kubernetes 1.10.1, Container Linux 1688.5.3-1758.0.0, AWS VPCs, Weave 2.3.0, kube-proxy IPVS. My guess is that it depends how fast/stable your network is?

Quentin-M avatar May 01 '18 16:05 Quentin-M

@dcowden

I'm still investigating the use of kubelet --resolve-conf, but what I'm really worried about is that all this is just a bandaid..

I have tried the other day, while it changed the resolv.conf of my static pods, all the other pods (with default dnsPolicy) were still based on what dns.go constructs. Note that the DNS options are written as a constant there. No possibility to get single-request-reopen without running your own compiled version of kubelet.

Quentin-M avatar May 01 '18 16:05 Quentin-M

@brb Thanks! I haven't realized yesterday that the patched iptables was already in an Alpine release. My issue is surely still present and both insert_failed and drop are still increasing. I note however that there are two other MASQUERADE rules in place, that do not have --random-fully, so that might be why? I am no network expert by any means unfortunately.

# Setup by WEAVE too.
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE

# Setup by both kubelet and kube-proxy, used to SNAT ports when querying services.
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE

-A WEAVE ! -s 172.16.0.0/16 -d 172.16.0.0/16 -j MASQUERADE --random-fully
-A WEAVE -s 172.16.0.0/16 ! -d 172.16.0.0/16 -j MASQUERADE --random-fully

Quentin-M avatar May 01 '18 18:05 Quentin-M

@brb, i tried this out. I was able to upgrade successfully, but it didnt help my problems.

I think maybe i don't have it installed correctly, because my iptables rules do not show the fully-random flag anywhere.

Here's my daemonset ( annotations and stuff after the image omitted ):

dcowden@ubuntu:~/gitwork/kubernetes$ kc get ds weave-net -n kube-system -o yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  ...omitted annotations...
  creationTimestamp: 2017-12-21T16:37:59Z
  generation: 4
  labels:
    name: weave-net
    role.kubernetes.io/networking: "1"
  name: weave-net
  namespace: kube-system
  resourceVersion: "21973562"
  selfLink: /apis/extensions/v1beta1/namespaces/kube-system/daemonsets/weave-net
  uid: 4dd96bf2-e66d-11e7-8b61-069a0a6ccd8c
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: weave-net
      role.kubernetes.io/networking: "1"
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      creationTimestamp: null
      labels:
        name: weave-net
        role.kubernetes.io/networking: "1"
    spec:
      containers:
      - command:
        - /home/weave/launch.sh
        env:
        - name: WEAVE_PASSWORD
          valueFrom:
            secretKeyRef:
              key: weave-passwd
              name: weave-passwd
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: IPALLOC_RANGE
          value: 100.96.0.0/11
        - name: WEAVE_MTU
          value: "8912"
        image: brb0/weave-kube:iptables-random-fully
        ...more stuff...

The daemonset was updated ok. Here's the iptables rules i see on a host. I dont see --random-fully anywhere:

[root@ip-172-25-19-92 ~]# iptables --list-rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N KUBE-FIREWALL
-N KUBE-FORWARD
-N KUBE-SERVICES
-N WEAVE-IPSEC-IN
-N WEAVE-NPC
-N WEAVE-NPC-DEFAULT
-N WEAVE-NPC-INGRESS
-A INPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -j KUBE-FIREWALL
-A INPUT -j WEAVE-IPSEC-IN
-A FORWARD -o weave -m comment --comment "NOTE: this must go before \'-j KUBE-FORWARD\'" -j WEAVE-NPC
-A FORWARD -o weave -m state --state NEW -j NFLOG --nflog-group 86
-A FORWARD -o weave -j DROP
-A FORWARD -i weave ! -o weave -j ACCEPT
-A FORWARD -o weave -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m comment --comment "kubernetes forward rules" -j KUBE-FORWARD
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT ! -p esp -m policy --dir out --pol none -m mark --mark 0x20000/0x20000 -j DROP
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 100.96.0.0/11 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 100.96.0.0/11 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-SERVICES -d 100.65.65.105/32 -p tcp -m comment --comment "default/schaeffler-logstash:http has no endpoints" -m tcp --dport 9600 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -p tcp -m comment --comment "ops/echoheaders:http has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31436 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 100.69.172.111/32 -p tcp -m comment --comment "ops/echoheaders:http has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
-A WEAVE-IPSEC-IN -s 172.25.83.126/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.234/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.40/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.51.21/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.51.170/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.51.29/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.19.130/32 -d 172.25.19.92/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-NPC -m state --state RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC -d 224.0.0.0/4 -j ACCEPT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-DEFAULT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-INGRESS
-A WEAVE-NPC -m set ! --match-set weave-local-pods dst -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-f(09:Q6gzJb~LE_pU4n:@416L dst -m comment --comment "DefaultAllow isolation for namespace: ops" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-jXXXW48#WnolRYPFUalO(fLpK dst -m comment --comment "DefaultAllow isolation for namespace: troubleshooting" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-E.1.0W^NGSp]0_t5WwH/]gX@L dst -m comment --comment "DefaultAllow isolation for namespace: default" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-0EHD/vdN#O4]V?o4Tx7kS;APH dst -m comment --comment "DefaultAllow isolation for namespace: kube-public" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-?b%zl9GIe0AET1(QI^7NWe*fO dst -m comment --comment "DefaultAllow isolation for namespace: kube-system" -j ACCEPT

I don't know what to try next.

dcowden avatar May 01 '18 20:05 dcowden

@dcowden You need to make sure you are calling iptables 1.6.2, otherwise you will not see the flag. One solution is to run iptables from within the weave container. As for you, it did not help my issue, the first AAAA query still appears to be dropped. I am compiling kube-proxy/kubelet to add the fully-random flag there as well, but this is going to take a while.

Quentin-M avatar May 01 '18 20:05 Quentin-M

@Quentin-M ah, ok right. I'll try that.

I have the same behavior-- i most commonly see the dropped packet on the first request, which is really odd.

dcowden avatar May 02 '18 00:05 dcowden

@Quentin-M since you are using 1.10, it appears you could use dnsPolicy None and then provide the values, since you're using k8s 1.10. Are you trying to avoid that?

We're still using 1.8, so that's not an option for us.

dcowden avatar May 02 '18 00:05 dcowden

@Quentin-M You can also custom DNS settings by:

apiVersion: v1
kind: Pod
metadata:
  namespace: default
  name: dns-example
spec:
  containers:
    - name: test
      image: nginx
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - xxxxx
    searches:
      - xxxxx
    options:
      - name: single-request-reopen

xiaoxubeii avatar May 02 '18 08:05 xiaoxubeii

@Quentin-M @dcowden

-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE

I'm not aware of this rule, and based on grepping it seems that Docker inserts it.

As there are quite a few iptables rules, I need to understand your packet flow. Could you answer to the following questions:

  1. Does the problem occur when you try to request kube-dns from a pod via ClusterIP of the kube-dns Service?
  2. Could you run sudo iptables-save -c before requesting kube-dns and the same cmd after (ideally, the request should fail due to the timeout)?
  3. sudo tcpdump -i weave -w foo.pcap on the host running a client pod while doing the steps above.

brb avatar May 02 '18 13:05 brb

@brb I'm sorry to be needing more help, but I'm still unable to see the --random-fully on my rules.

I ran iptables from within the weave container, and confirmed that i do have iptables v 1.6.2:


/home/weave # iptables 
iptables v1.6.2: no command specified
Try `iptables -h' or 'iptables --help' for more information.
/home/weave # iptables --list-rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N KUBE-FIREWALL
-N KUBE-FORWARD
-N KUBE-SERVICES
-N WEAVE-IPSEC-IN
-N WEAVE-NPC
-N WEAVE-NPC-DEFAULT
-N WEAVE-NPC-INGRESS
-A INPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A INPUT -j KUBE-FIREWALL
-A INPUT -j WEAVE-IPSEC-IN
-A FORWARD -o weave -m comment --comment "NOTE: this must go before \'-j KUBE-FORWARD\'" -j WEAVE-NPC
-A FORWARD -o weave -m state --state NEW -j NFLOG --nflog-group 86
-A FORWARD -o weave -j DROP
-A FORWARD -i weave ! -o weave -j ACCEPT
-A FORWARD -o weave -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m comment --comment "kubernetes forward rules" -j KUBE-FORWARD
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A OUTPUT ! -p esp -m policy --dir out --pol none -m mark --mark 0x20000/0x20000 -j DROP
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
-A KUBE-FORWARD -s 100.96.0.0/11 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-FORWARD -d 100.96.0.0/11 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A KUBE-SERVICES -d 100.65.65.105/32 -p tcp -m comment --comment "default/schaeffler-logstash:http has no endpoints" -m tcp --dport 9600 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -p tcp -m comment --comment "ops/echoheaders:http has no endpoints" -m addrtype --dst-type LOCAL -m tcp --dport 31436 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 100.69.172.111/32 -p tcp -m comment --comment "ops/echoheaders:http has no endpoints" -m tcp --dport 80 -j REJECT --reject-with icmp-port-unreachable
-A WEAVE-IPSEC-IN -s 172.25.51.145/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.19.80/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.234/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.19.81/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.19.213/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.117/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-IPSEC-IN -s 172.25.83.243/32 -d 172.25.51.112/32 -p udp -m udp --dport 6784 -m mark ! --mark 0x20000/0x20000 -j DROP
-A WEAVE-NPC -m state --state RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC -d 224.0.0.0/4 -j ACCEPT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-DEFAULT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-INGRESS
-A WEAVE-NPC -m set ! --match-set weave-local-pods dst -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-E.1.0W^NGSp]0_t5WwH/]gX@L dst -m comment --comment "DefaultAllow isolation for namespace: default" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-0EHD/vdN#O4]V?o4Tx7kS;APH dst -m comment --comment "DefaultAllow isolation for namespace: kube-public" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-?b%zl9GIe0AET1(QI^7NWe*fO dst -m comment --comment "DefaultAllow isolation for namespace: kube-system" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-f(09:Q6gzJb~LE_pU4n:@416L dst -m comment --comment "DefaultAllow isolation for namespace: ops" -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-jXXXW48#WnolRYPFUalO(fLpK dst -m comment --comment "DefaultAllow isolation for namespace: troubleshooting" -j ACCEPT
[root@ip-172-25-51-112 ~]# docker ps | grep weave
51b7fd4a2fb6        weaveworks/weave-npc@sha256:1d85c63e8b4cd433363d5527fdae263069d118308521490a9ea2d4b00b484a5e                                "/usr/bin/weave-npc"     8 hours ago         Up 8 hours                              k8s_weave-npc_weave-net-zj924_kube-system_3f526bf1-4dd6-11e8-93c1-06cef8be63fa_0
10fdaf3a91cd        brb0/weave-kube@sha256:84010a75a045b66cf79915b0c0bc44dce59692a30dbd6e80b00149301e5e9a4c                                     "/home/weave/launc..."   8 hours ago         Up 8 hours                              k8s_weave_weave-net-zj924_kube-system_3f526bf1-4dd6-11e8-93c1-06cef8be63fa_0
adef4ba0b8ea        gcr.io/google_containers/pause-amd64:3.0                                                                                    "/pause"                 8 hours ago         Up 8 hours                              k8s_POD_weave-net-zj924_kube-system_3f526bf1-4dd6-11e8-93c1-06cef8be63fa_0

My best guess is that the rules may not have been re-built when i updated the daemonset. I made sure the new image is running on all nodes, but all i did was to update the ds and watch it terminate and create all the new containers. maybe it didnt re-build the iptables rules on the underlying nodes?

dcowden avatar May 02 '18 15:05 dcowden