scope icon indicating copy to clipboard operation
scope copied to clipboard

weavescope installation on k3s not working

Open milindchawre opened this issue 4 years ago • 4 comments

BUG REPORT

What you expected to happen?

Expecting weavescope to work on k3s.

What happened?

  • I installed weavescope on my k3s cluster using this yaml it is inspired from this.
  • Weavescope is configured and can see the UI, but containers are not visible all the k8s and docker probes where not working.
  • Found error logs in weave-scope-cluster-agent pod.
root@kube-master-ce86:/# kubectl logs pod/weave-scope-cluster-agent-669d84579b-qm6dx -n weave
time="2020-06-29T17:22:35Z" level=info msg="publishing to: weave-scope-app.weave.svc.cluster.local.:80"
<probe> INFO: 2020/06/29 17:22:35.936466 Basic authentication disabled
<probe> INFO: 2020/06/29 17:22:35.950455 command line args: --mode=probe --probe-only=true --probe.cri.endpoint=unix///var/run/k3s/containerd/containerd.sock --probe.kubernetes.role=cluster weave-scope-app.weave.svc.cluster.local.:80
<probe> INFO: 2020/06/29 17:22:35.950581 probe starting, version 1.13.1, ID 37b7bd86d15ce42f
<probe> ERRO: 2020/06/29 17:22:35.950669 Error getting docker bridge ip: route ip+net: no such network interface
<probe> INFO: 2020/06/29 17:22:35.950974 kubernetes: targeting api server https://192.168.128.1:443
<probe> WARN: 2020/06/29 17:22:35.997726 Error collecting weave status, backing off 10s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> INFO: 2020/06/29 17:22:36.070069 volumesnapshots are not supported by this Kubernetes version
<probe> INFO: 2020/06/29 17:22:36.074895 Control connection to weave-scope-app.weave.svc.cluster.local. starting
<probe> INFO: 2020/06/29 17:22:36.101873 Publish loop for weave-scope-app.weave.svc.cluster.local. starting
<probe> WARN: 2020/06/29 17:22:36.105046 Error Kubernetes reflector (namespaces), backing off 20s: github.com/weaveworks/scope/probe/kubernetes/client.go:279: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:weave:weave-scope" cannot list resource "namespaces" in API group "" at the cluster scope
<probe> INFO: 2020/06/29 17:22:36.113734 volumesnapshotdatas are not supported by this Kubernetes version
<probe> WARN: 2020/06/29 17:22:45.998536 Error collecting weave status, backing off 20s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> WARN: 2020/06/29 17:22:56.116590 Error Kubernetes reflector (namespaces), backing off 40s: github.com/weaveworks/scope/probe/kubernetes/client.go:279: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:weave:weave-scope" cannot list resource "namespaces" in API group "" at the cluster scope
<probe> WARN: 2020/06/29 17:23:05.999339 Error collecting weave status, backing off 40s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> WARN: 2020/06/29 17:23:36.119469 Error Kubernetes reflector (namespaces), backing off 1m20s: github.com/weaveworks/scope/probe/kubernetes/client.go:279: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:weave:weave-scope" cannot list resource "namespaces" in API group "" at the cluster scope
<probe> WARN: 2020/06/29 17:24:56.223149 Error Kubernetes reflector (namespaces), backing off 2m40s: github.com/weaveworks/scope/probe/kubernetes/client.go:279: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:weave:weave-scope" cannot list resource "namespaces" in API group "" at the cluster scope
root@kube-master-ce86:/#
  • Found error logs in weave-scope-agent pod.
root@kube-master-ce86:/# kubectl logs pod/weave-scope-agent-vnt6s -n weave
warning: I'm not PID 1, I'm PID 13578
/sbin/runsvdir started
time="2020-06-29T17:08:55Z" level=info msg="publishing to: weave-scope-app.weave.svc.cluster.local.:80"
<probe> INFO: 2020/06/29 17:08:55.098506 Basic authentication disabled
<probe> INFO: 2020/06/29 17:09:35.106295 command line args: --mode=probe --probe-only=true --probe.cri.endpoint=unix///var/run/k3s/containerd/containerd.sock --probe.docker=true --probe.docker.bridge=cni0 --probe.kubernetes.role=host weave-scope-app.weave.svc.cluster.local.:80
<probe> INFO: 2020/06/29 17:09:35.106335 probe starting, version 1.13.1, ID 3716a93e720f6c4
<probe> WARN: 2020/06/29 17:09:35.131408 Cannot resolve 'scope.weave.local.': dial tcp 192.168.1.1:53: connect: connection refused
<probe> ERRO: 2020/06/29 17:09:35.395996 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory
<probe> WARN: 2020/06/29 17:09:35.397007 Error collecting weave status, backing off 10s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> WARN: 2020/06/29 17:09:45.131809 Cannot resolve 'weave-scope-app.weave.svc.cluster.local.': lookup weave-scope-app.weave.svc.cluster.local. on 192.168.128.10:53: read udp 172.31.5.35:57926->192.168.128.10:53: i/o timeout
<probe> ERRO: 2020/06/29 17:09:45.396409 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory
<probe> WARN: 2020/06/29 17:09:45.400417 Error collecting weave status, backing off 20s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> ERRO: 2020/06/29 17:09:55.402544 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory
<probe> ERRO: 2020/06/29 17:10:05.230600 Error checking version: Get https://checkpoint-api.weave.works/v1/check/scope-probe?arch=amd64&flag_kernel-version=4.4.0-174-generic&flag_kubernetes_enabled=true&flag_os=linux&os=linux&signature=H6IMWSeq53Xrq8rLebprHvappXmWMOkHN%2FPkW5STFU0%3D&version=1.13.1: dial tcp: i/o timeout
<probe> WARN: 2020/06/29 17:10:05.407358 Error collecting weave status, backing off 40s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> ERRO: 2020/06/29 17:10:05.407631 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory
<probe> ERRO: 2020/06/29 17:10:15.407976 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory
<probe> ERRO: 2020/06/29 17:10:25.408421 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory
<probe> ERRO: 2020/06/29 17:10:35.313639 Error checking version: Get https://checkpoint-api.weave.works/v1/check/scope-probe?arch=amd64&flag_kernel-version=4.4.0-174-generic&flag_kubernetes_enabled=true&flag_os=linux&os=linux&signature=H6IMWSeq53Xrq8rLebprHvappXmWMOkHN%2FPkW5STFU0%3D&version=1.13.1: dial tcp: i/o timeout
<probe> ERRO: 2020/06/29 17:10:35.409510 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory
<probe> ERRO: 2020/06/29 17:10:45.409941 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory
<probe> ERRO: 2020/06/29 17:10:55.410554 docker registry: Get http://unix.sock/containers/json?all=1: dial unix /var/run/docker.sock: connect: no such file or directory

How to reproduce it?

Versions:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:38:50Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3-k3s.2", GitCommit:"e7e6a3c4e9a7d80b87793612730d10a863a25980", GitTreeState:"clean", BuildDate:"2019-11-18T18:31:23Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"}

milindchawre avatar Jun 29 '20 17:06 milindchawre

There were couple of things missing. Thanks @bboreham for pointing out different options to override the cri endpoint. I need to make these changes to make the containers probing worked. Now I can see containers in weavescope UI.

            - '--probe.docker=false'
            - '--weave=false'
            - '--probe.cri=true'
            - '--probe.cri.endpoint=unix:///var/run/k3s/containerd/containerd.sock'

👆 These options were crucial. Disabling docker probe and enabling cri probe. In the end, pointing to your CRI endpoint. So if docker.sock is not present on your system then find socket file location for corresponding container runtime in my case its containerd.sock. Then override the CRi endpoint to point to this new socket file using above options.

milindchawre avatar Jul 01 '20 18:07 milindchawre

Containers probing is working fine, but still weavescope is giving some RBAC errors.

2020-07-02T02:03:03.446759501+09:00 <probe> WARN: 2020/07/01 17:03:03.446413 Error Kubernetes reflector (namespaces), backing off 1m20s: github.com/weaveworks/scope/probe/kubernetes/client.go:279: Failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:weave:weave-scope" cannot list resource "namespaces" in API group "" at the cluster scope

milindchawre avatar Jul 01 '20 18:07 milindchawre

I am getting this error on following above steps.

Error generating CRI report: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /var/run/k3s/containerd/containerd.sock: connect: no such file or directory"

In terminal, this is working. sudo crictl -r "unix:///var/run/k3s/containerd/containerd.sock" version

Version: 0.1.0 RuntimeName: containerd RuntimeVersion: v1.5.8-k3s1 RuntimeApiVersion: v1alpha2

shreyansh96 avatar Feb 28 '22 09:02 shreyansh96

Can you try modifying the DaemonSet to mount that path inside the container?

bboreham avatar Mar 08 '22 08:03 bboreham