i/o timeout communicating from kubewatch to master
I'm seeing i/o errors with the Kubewatch pod trying to talk to the master node. Kubernetes 1.11.5 installation running locally on CentOS 7. Have tried installing Kubewatch using Helm as well as using kubectl. Pod is running, but seeing same log messages with i/o timeout as the previous user had reported and no messages being sent to Slack.
[root@kube-acitest-3 ~]# kubectl get pod kubewatch -n monitoring -o yaml apiVersion: v1 kind: Pod metadata: annotations: opflex.cisco.com/computed-endpoint-group: '{"policy-space":"kubeacitest","name":"kubernetes|kube-default"}' opflex.cisco.com/computed-security-group: '[]' creationTimestamp: 2018-12-16T19:49:41Z name: kubewatch namespace: monitoring resourceVersion: "929213" selfLink: /api/v1/namespaces/monitoring/pods/kubewatch uid: ba92e4d3-016b-11e9-a743-005056863a6e spec: containers:
- image: bitnami/kubewatch:latest
imagePullPolicy: Always
name: kubewatch
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /root name: config-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kubewatch-token-wgwv4 readOnly: true
- args:
- proxy
- -p
- "8080" image: bitnami/kubectl:latest imagePullPolicy: Always name: proxy resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kubewatch-token-wgwv4 readOnly: true dnsPolicy: ClusterFirst nodeName: kube-acitest-4 priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: kubewatch serviceAccountName: kubewatch terminationGracePeriodSeconds: 30 tolerations:
- effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300
- effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes:
- configMap: defaultMode: 420 name: kubewatch name: config-volume
- name: kubewatch-token-wgwv4 secret: defaultMode: 420 secretName: kubewatch-token-wgwv4 status: conditions:
- lastProbeTime: null lastTransitionTime: 2018-12-16T19:49:41Z status: "True" type: Initialized
- lastProbeTime: null lastTransitionTime: 2018-12-16T19:49:46Z status: "True" type: Ready
- lastProbeTime: null lastTransitionTime: null status: "True" type: ContainersReady
- lastProbeTime: null lastTransitionTime: 2018-12-16T19:49:41Z status: "True" type: PodScheduled containerStatuses:
- containerID: docker://4d9da551ce6f89205e77989399b57726ae2eeac4762e17109de18e9feb9bc281 image: bitnami/kubewatch:0.0.4 imageID: docker-pullable://bitnami/kubewatch@sha256:11b7ae4e0a4ac88aaf95411d9778295ba863cf86773c606c0cacfc853960ea7b lastState: {} name: kubewatch ready: true restartCount: 0 state: running: startedAt: 2018-12-16T19:49:43Z
- containerID: docker://497ea649b1a5017040316bb6453e13aef46ef325bc430c43cfde3ad6f7f9ff02 image: bitnami/kubectl:latest imageID: docker-pullable://bitnami/kubectl@sha256:a54bee5a861442e591e08a8a37b28b0f152955785c07ce4e400cb57795ffa30f lastState: {} name: proxy ready: true restartCount: 0 state: running: startedAt: 2018-12-16T19:49:45Z hostIP: 10.10.51.216 phase: Running podIP: 172.20.0.97 qosClass: BestEffort startTime: 2018-12-16T19:49:41Z
[root@kube-acitest-3 ~]# kubectl get serviceaccount kubewatch -n monitoring NAME SECRETS AGE kubewatch 1 5h [root@kube-acitest-3 ~]# kubectl get clusterrole kubewatch -n monitoring NAME AGE kubewatch 47m [root@kube-acitest-3 ~]# kubectl get clusterrolebinding kubewatch -n monitoring NAME AGE kubewatch 47m
[root@kube-acitest-3 ~]# kubectl logs -f kubewatch kubewatch -n monitoring | more ==> Writing config file... time="2018-12-16T19:49:44Z" level=info msg="Starting kubewatch controller" pkg=kubewatch-pod time="2018-12-16T19:49:44Z" level=info msg="Starting kubewatch controller" pkg=kubewatch-service time="2018-12-16T19:49:44Z" level=info msg="Starting kubewatch controller" pkg=kubewatch-deployment time="2018-12-16T19:49:44Z" level=info msg="Starting kubewatch controller" pkg=kubewatch-namespace ERROR: logging before flag.Parse: E1216 19:50:14.042285 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.g o:377: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout ERROR: logging before flag.Parse: E1216 19:50:14.042418 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.g o:377: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout ERROR: logging before flag.Parse: E1216 19:50:14.042516 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.g o:377: Failed to list *v1beta1.Deployment: Get https://10.96.0.1:443/apis/apps/v1beta1/deployments?limit=500&resourceVersion=0: dial tcp 10.96. 0.1:443: i/o timeout ERROR: logging before flag.Parse: E1216 19:50:14.042591 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.g o:377: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeo ut ERROR: logging before flag.Parse: E1216 19:50:45.042997 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.g o:377: Failed to list *v1.Pod: Get https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout ERROR: logging before flag.Parse: E1216 19:50:45.046896 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.g o:377: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout ERROR: logging before flag.Parse: E1216 19:50:45.048370 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.g o:377: Failed to list *v1beta1.Deployment: Get https://10.96.0.1:443/apis/apps/v1beta1/deployments?limit=500&resourceVersion=0: dial tcp 10.96. 0.1:443: i/o timeout ERROR: logging before flag.Parse: E1216 19:50:45.050598 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.g o:377: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeo ut
I think this is because kubernetes api server intendly disconnect it with default timeout settings, see
- https://github.com/godaddy/kubernetes-client/issues/294#issuecomment-406006717
- https://github.com/kubernetes-client/python/issues/124#issuecomment-341615167
so there need to be some retry mechanism in kubewatch to reconnect on timeout however, there's still a chance to lose some events between the reconnection if it matters.
Maybe it will work if you change the dnsPolicy settings.
e.g.
kubectl edit deployment.apps/kubewatch -n monitoring
dnsPolicy: ClusterFirst to dnsPolicy: Default
hello,
any news, i have the same error, example:
ERROR: logging before flag.Parse: E0109 14:30:07.080518 1 reflector.go:205] github.com/bitnami-labs/kubewatch/pkg/controller/controller.go:377: Failed to list *v1.Pod: Get https://172.20.0.1:443/api/v1/pods?limit=500&resourceVersion=0: dial tcp 172.20.0.1:443: i/o timeout