scope icon indicating copy to clipboard operation
scope copied to clipboard

Scope didn't show all kubernetes nodes and all namespaces

Open ashoks27 opened this issue 5 years ago • 8 comments

What you expected to happen?

Kubernetes to show the hosts of all nodes from Kubernetes cluster and to show all namespaces

What happened?

It shows only the master node details and its system pods details.

How to reproduce it?

$ kubectl apply -f "https://cloud.weave.works/k8s/scope.yaml?k8s-version=$(kubectl version | base64 | tr -d '\n')"

$ kubectl port-forward -n weave "$(kubectl get -n weave pod --selector=weave-scope-component=app -o jsonpath='{.items..metadata.name}')" 4040

Anything else we need to know?

Versions:

$ scope version
scope is not installed on this kubernetes
$ docker version
1.13.1
$ uname -a
Linux XXXXX 3.10.0-1062.4.3.el7.x86_64 #1 SMP Wed Nov 13 23:58:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ kubectl version
1.16

Logs:

$ docker logs weavescope

or, if using Kubernetes:

$ kubectl logs <weave-scope-pod> -n <namespace> 

time="2020-01-16T08:59:02Z" level=info msg="publishing to: weave-scope-app.weave.svc.cluster.local:80" INFO: 2020/01/16 08:59:02.698144 Basic authentication disabled INFO: 2020/01/16 08:59:52.705396 command line args: --mode=probe --probe-only=true --probe.docker=true --probe.docker.bridge=docker0 --probe.kubernetes.role=host --probe.publish.interval=4.5s --probe.spy.interval=2s weave-scope-app.weave.svc.cluster.local:80 INFO: 2020/01/16 08:59:52.705473 probe starting, version 1.12.0, ID 2073a6f658f8315e WARN: 2020/01/16 08:59:52.710795 Cannot resolve 'scope.weave.local.': dial tcp 172.17.0.1:53: connect: connection refused WARN: 2020/01/16 08:59:52.783117 Error setting up the eBPF tracker, falling back to proc scanning: kernel not supported: got kernel 3.10.0-1062.4.3.el7.x86_64 but need kernel >=4.4 WARN: 2020/01/16 08:59:52.807210 Error collecting weave status, backing off 10s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the --weave=false option. WARN: 2020/01/16 09:00:02.808038 Error collecting weave status, backing off 20s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the --weave=false option. WARN: 2020/01/16 09:00:22.808885 Error collecting weave status, backing off 40s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the --weave=false option. ERRO: 2020/01/16 09:00:22.844881 Error checking version: Get https://checkpoint-api.weave.works/v1/check/scope-probe?arch=amd64&flag_kernel-version=3.10.0-1062.4.3.el7.x86_64&flag_kubernetes_enabled=true&flag_os=linux&os=linux&signature=BwL9A0iTwtjEKlXj32GZv3PyESFpb5XCcvBf6wvPR1s%3D&version=1.12.0: dial tcp: i/o timeout WARN: 2020/01/16 09:00:42.707989 Cannot resolve 'weave-scope-app.weave.svc.cluster.local': lookup weave-scope-app.weave.svc.cluster.local on 10.96.0.10:53: read udp 10.105.208.63:40260->10.96.0.10:53: i/o timeout ERRO: 2020/01/16 09:00:52.926855 Error checking version: Get https://checkpoint-api.weave.works/v1/check/scope-probe?arch=amd64&flag_kernel-version=3.10.0-1062.4.3.el7.x86_64&flag_kubernetes_enabled=true&flag_os=linux&os=linux&signature=BwL9A0iTwtjEKlXj32GZv3PyESFpb5XCcvBf6wvPR1s%3D&version=1.12.0: dial tcp: i/o timeout ERRO: 2020/01/16 14:23:01.198075 Error checking version: Get https://checkpoint-api.weave.works/v1/check/scope-probe?arch=amd64&flag_kernel-version=3.10.0-1062.4.3.el7.x86_64&flag_kubernetes_enabled=true&flag_os=linux&os=linux&signature=BwL9A0iTwtjEKlXj32GZv3PyESFpb5XCcvBf6wvPR1s%3D&version=1.12.0: dial tcp: i/o timeout

ashoks27 avatar Jan 16 '20 14:01 ashoks27

I have the same porblem,it is be ok when i restart the hosts.

baifuwa avatar Feb 03 '20 02:02 baifuwa

Thanks for opening the issue @ashoks27!

Could you please give us a bit more info about your Kubernetes cluster?

  • Which cloud provider are you using?
  • How many nodes does the cluster consist of?
  • Is there a Weave Scope probe running per each host? (please paste the output of kubectl get pods -n weave and kubectl get nodes commands)
  • What are the logs for the Weave Scope app pod?

fbarl avatar Feb 18 '20 13:02 fbarl

I have the same problem. I have a functional k8 kluster made with kubeadm running flannel and k8 1.17.2 running on vmware esxi hosts 6.5. The vm:s (masters and workers) have 2 cpu and 8 GB ram and they have a cpu usage of less than 20%. Using podsecurity policys and granted weave namespace pods to "runasroot". No quota in weave namespace Its a multimaster(3) cluster and 3 workers. I access weave-scope via a ingress. I have not seen this before on older versions of k8 (1.16.3 and sles 12) with weave-scope version 1.12. When I access weave-scope I only see one node (weave-scope-agent-rjqpm below) and thats the only agent that can resolve the name, the cluster agent and the weave app can resolve the name but not the remaning 5 agents. I deployed a dnsutil pod in the same namespace and it can resolve the name. https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ Its strange that some pods can resolve but other cant.

I run sles 15 sp1 4.12.14-197.34-default docker version 19.03.5 The cluster

NAME                                       READY   STATUS    RESTARTS   AGE
dnsutils                                    1/1     Running   1          67m
weave-scope-agent-4vzzj                     1/1     Running   1          91m
weave-scope-agent-bjjlw                     1/1     Running   1          91m
weave-scope-agent-gjh6w                     1/1     Running   1          91m
weave-scope-agent-jt6nn                     1/1     Running   1          91m
weave-scope-agent-p7nmv                     1/1     Running   1          91m
weave-scope-agent-rjqpm                     1/1     Running   0          91m
weave-scope-app-7f44d5786c-82pfq            1/1     Running   1          91m
weave-scope-cluster-agent-b4f45797c-8bgj6   1/1     Running   1          91m

A example log

time="2020-03-05T15:16:33Z" level=info msg="publishing to: weave-scope-app.weave.svc.cluster.local:80"
<probe> INFO: 2020/03/05 15:16:33.669514 Basic authentication disabled
<probe> INFO: 2020/03/05 15:17:23.680591 command line args: --mode=probe --probe-only=true --probe.docker=true --probe.docker.bridge=docker0 --probe.kubernetes.role=host --probe.no-controls=true --probe.publish.interval=4.5s --probe.spy.interval=2s --weave=false weave-scope-app.weave.svc.cluster.local:80
<probe> INFO: 2020/03/05 15:17:23.680648 probe starting, version 1.12.0, ID 1e0006c7acb22bd0
<probe> ERRO: 2020/03/05 15:17:53.760928 Error checking version: Get https://checkpoint-api.weave.works/v1/check/scope-probe?arch=amd64&flag_kernel-version=4.12.14-197.34-default&flag_kubernetes_enabled=true&flag_os=linux&os=linux&signature=wNyNvwpqNYzhya33vqah9m1fGY7y3Y8jupssh5LDItU%3D&version=1.12.0: dial tcp: i/o timeout
<probe> WARN: 2020/03/05 15:18:13.687944 Cannot resolve 'weave-scope-app.weave.svc.cluster.local': lookup weave-scope-app.weave.svc.cluster.local on 10.96.0.10:53: read udp 172.20.95.115:59660->10.96.0.10:53: i/o timeout
<probe> ERRO: 2020/03/05 15:18:23.825219 Error checking version: Get https://checkpoint-api.weave.works/v1/check/scope-probe?arch=amd64&flag_kernel-version=4.12.14-197.34-default&flag_kubernetes_enabled=true&flag_os=linux&os=linux&signature=wNyNvwpqNYzhya33vqah9m1fGY7y3Y8jupssh5LDItU%3D&version=1.12.0: dial tcp: i/o timeout

DNS resolve tests

root@bvin01-k801m-01:/ # kubectl -n weave exec -ti dnsutils -- nslookup weave-scope-app.weave.svc.cluster.local Server: 10.96.0.10 Address: 10.96.0.10#53

Name: weave-scope-app.weave.svc.cluster.local Address: 10.110.216.172

root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-cluster-agent-b4f45797c-8bgj6 -- nslookup weave-scope-app.weave.svc.cluster.local nslookup: can't resolve '(null)': Name does not resolve

Name: weave-scope-app.weave.svc.cluster.local Address 1: 10.110.216.172 weave-scope-app.weave.svc.cluster.local root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-app-7f44d5786c-82pfq -- nslookup weave-scope-app.weave.svc.cluster.local nslookup: can't resolve '(null)': Name does not resolve

Name: weave-scope-app.weave.svc.cluster.local Address 1: 10.110.216.172 weave-scope-app.weave.svc.cluster.local root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-agent-rjqpm -- nslookup weave-scope-app.weave.svc.cluster.local nslookup: can't resolve '(null)': Name does not resolve

Name: weave-scope-app.weave.svc.cluster.local Address 1: 10.110.216.172 weave-scope-app.weave.svc.cluster.local root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-agent-p7nmv -- nslookup weave-scope-app.weave.svc.cluster.local nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'weave-scope-app.weave.svc.cluster.local': Try again command terminated with exit code 1 root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-agent-jt6nn -- nslookup kubernetes.default nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'kubernetes.default': Try again command terminated with exit code 1 root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-agent-jt6nn -- nslookup weave-scope-app.weave.svc.cluster.local nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'weave-scope-app.weave.svc.cluster.local': Try again command terminated with exit code 1 root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-agent-gjh6w -- nslookup weave-scope-app.weave.svc.cluster.local nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'weave-scope-app.weave.svc.cluster.local': Try again command terminated with exit code 1 root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-agent-bjjlw -- nslookup weave-scope-app.weave.svc.cluster.local nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'weave-scope-app.weave.svc.cluster.local': Try again command terminated with exit code 1 root@bvin01-k801m-01:/ # kubectl -n weave exec -ti weave-scope-agent-4vzzj -- nslookup weave-scope-app.weave.svc.cluster.local nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'weave-scope-app.weave.svc.cluster.local': Try again command terminated with exit code 1

I even made the dnsutil pod to a deployment and tested from all 3 workers and i could do the dns resolve, so the network and dns seems to be working in the cluster.

NAME                                        READY   STATUS    RESTARTS   AGE    IP              NODE              NOMINATED NODE   READINESS GATES
dnsutil-dp-75d66fc4bc-dffwm                 1/1     Running   0          13s    10.244.3.17     bvin01-k801w-02   <none>           <none>
weave-scope-agent-4vzzj                     1/1     Running   1          104m   172.20.95.115   bvin01-k801m-03   <none>           <none>
weave-scope-agent-bjjlw                     1/1     Running   1          104m   172.20.95.125   bvin01-k801w-02   <none>           <none>
weave-scope-agent-gjh6w                     1/1     Running   1          104m   172.20.95.126   bvin01-k801w-03   <none>           <none>
weave-scope-agent-jt6nn                     1/1     Running   1          104m   172.20.95.124   bvin01-k801w-01   <none>           <none>
weave-scope-agent-p7nmv                     1/1     Running   1          104m   172.20.95.94    bvin01-k801m-02   <none>           <none>
weave-scope-agent-rjqpm                     1/1     Running   0          104m   172.20.95.91    bvin01-k801m-01   <none>           <none>
weave-scope-app-7f44d5786c-82pfq            1/1     Running   1          104m   10.244.2.15     bvin01-k801w-01   <none>           <none>
weave-scope-cluster-agent-b4f45797c-8bgj6   1/1     Running   1          104m   10.244.2.22     bvin01-k801w-01   <none>           <none>
[infra] root@bvin01-k801m-01:/export/home/res/weave # kubectl -n weave exec -ti dnsutil-dp-75d66fc4bc-dffwm -- nslookup weave-scope-app.weave.svc.cluster.local
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   weave-scope-app.weave.svc.cluster.local
Address: 10.110.216.172

root@bvin01-k801m-01:/ # kubectl -n weave delete pod dnsutil-dp-75d66fc4bc-dffwm
pod "dnsutil-dp-75d66fc4bc-dffwm" deleted
root@bvin01-k801m-01:/ # kubectl -n weave get pods -o wide
NAME                                        READY   STATUS    RESTARTS   AGE    IP              NODE              NOMINATED NODE   READINESS GATES
dnsutil-dp-75d66fc4bc-xhh9x                 1/1     Running   0          2m3s   10.244.2.23     bvin01-k801w-01   <none>           <none>
weave-scope-agent-4vzzj                     1/1     Running   1          107m   172.20.95.115   bvin01-k801m-03   <none>           <none>
weave-scope-agent-bjjlw                     1/1     Running   1          107m   172.20.95.125   bvin01-k801w-02   <none>           <none>
weave-scope-agent-gjh6w                     1/1     Running   1          107m   172.20.95.126   bvin01-k801w-03   <none>           <none>
weave-scope-agent-jt6nn                     1/1     Running   1          107m   172.20.95.124   bvin01-k801w-01   <none>           <none>
weave-scope-agent-p7nmv                     1/1     Running   1          107m   172.20.95.94    bvin01-k801m-02   <none>           <none>
weave-scope-agent-rjqpm                     1/1     Running   0          107m   172.20.95.91    bvin01-k801m-01   <none>           <none>
weave-scope-app-7f44d5786c-82pfq            1/1     Running   1          107m   10.244.2.15     bvin01-k801w-01   <none>           <none>
weave-scope-cluster-agent-b4f45797c-8bgj6   1/1     Running   1          107m   10.244.2.22     bvin01-k801w-01   <none>           <none>
root@bvin01-k801m-01:/ # kubectl -n weave exec -ti dnsutil-dp-75d66fc4bc-xhh9x -- nslookup weave-scope-app.weave.svc.cluster.local
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   weave-scope-app.weave.svc.cluster.local
Address: 10.110.216.172

root@bvin01-k801m-01:/ # kubectl -n weave delete pod dnsutil-dp-75d66fc4bc-xhh9x
pod "dnsutil-dp-75d66fc4bc-xhh9x" deleted
root@bvin01-k801m-01:/ # kubectl -n weave get pods -o wide
NAME                                        READY   STATUS    RESTARTS   AGE     IP              NODE              NOMINATED NODE   READINESS GATES
dnsutil-dp-75d66fc4bc-2b9xk                 1/1     Running   0          6m43s   10.244.4.17     bvin01-k801w-03   <none>           <none>
weave-scope-agent-4vzzj                     1/1     Running   1          119m    172.20.95.115   bvin01-k801m-03   <none>           <none>
weave-scope-agent-bjjlw                     1/1     Running   2          119m    172.20.95.125   bvin01-k801w-02   <none>           <none>
weave-scope-agent-gjh6w                     1/1     Running   1          119m    172.20.95.126   bvin01-k801w-03   <none>           <none>
weave-scope-agent-jt6nn                     1/1     Running   2          119m    172.20.95.124   bvin01-k801w-01   <none>           <none>
weave-scope-agent-p7nmv                     1/1     Running   1          119m    172.20.95.94    bvin01-k801m-02   <none>           <none>
weave-scope-agent-rjqpm                     1/1     Running   0          119m    172.20.95.91    bvin01-k801m-01   <none>           <none>
weave-scope-app-7f44d5786c-9kn2v            1/1     Running   0          117s    10.244.4.20     bvin01-k801w-03   <none>           <none>
weave-scope-cluster-agent-b4f45797c-l8vl5   1/1     Running   0          117s    10.244.4.21     bvin01-k801w-03   <none>           <none>```
root@bvin01-k801m-01:/ # kubectl -n weave exec -ti dnsutil-dp-75d66fc4bc-2b9xk -- nslookup weave-scope-app.weave.svc.cluster.local
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   weave-scope-app.weave.svc.cluster.local
Address: 10.110.216.172

husa570 avatar Mar 05 '20 17:03 husa570

@husa570 please open a new issue; managing multiple threads of conversation in a GitHub issue is impossible.

Suggest you look at the dnsPolicy on your pods - it should be as we have in the example config: https://github.com/weaveworks/scope/blob/7c838affaaa0ca12f8510c2d28f1e1853fa85d2e/examples/k8s/ds.yaml#L49

bboreham avatar Mar 24 '20 11:03 bboreham

I have dnspolicy ClusterFirstWithHostNet (since I use the default) in the daemonset. even tried ClusterFirst but with no change. I wont open a issue at the moment since we dropped using weave-scope after spending time with the above troubleshooting. Open a issue means we have to work with the problem but since its seams like a basic problem that maybe our combination of software versions trigger, it will either be solve later on by someone else…. or its just us that have this problems with weave-scope in the fantastic world of kubernetes. Even a delete of weave and namespace and a new deployment in the cluster the error stays the same, so its consistent. We are running several other applications in other namespaces in the cluster without any problems. So from my point of view this issue is a non-issue and weave-scope work as designed, its just we that cant use it.

husa570 avatar Mar 25 '20 07:03 husa570

Same issue on baremetal k8s set up with kubeadm... Tried installing it with the yaml as above, as well as Helm, no difference, DNS errors show up and indeed cannot do nslookup within the agents.

Everything else works fine on this 10 node cluster, including DNS from any other pod. What's even weird in my case is that the nodes come and go... I never have more than 2-3 show up at the same time, but randomly they pop up, then go away.

psarossy avatar Jul 21 '20 22:07 psarossy

I change hostNetwork: to false (hostNetwork: false) in the daemonset for tha weave-scope-agent and then it seems to work.

husa570 avatar Jul 22 '20 05:07 husa570

@husa570 I set: hostNetwork: false

Now I get 6 of the 9 nodes to show up, but they are listed under the agent name instead of the host name (weave-scope-agent-XXXXX), and they are still coming and going randomly

The nodes that don't show up say the following in the logs: ERRO: 2020/07/22 16:30:39.719429 Error fetching app details: Get http://10.0.0.200:80/api: dial tcp 10.0.0.200:80: connect: connection refused

10.0.0.200 is the IP of my Traefik Ingress so no idea why it's trying to fetch anything from there with the agents...

psarossy avatar Jul 22 '20 16:07 psarossy