kube-netc
kube-netc copied to clipboard
kube-netc restarts several times
from a test system:
ubuntu@ip-10-10-129-78:~$ kubectl get pods -A | grep kube-netc
default kube-netc-5hrkd 1/1 Running 25 3d7h
ubuntu@ip-10-10-129-78:~$
ubuntu@ip-10-10-129-78:~$ kubectl describe pod kube-netc-5hrkd
Name: kube-netc-5hrkd
Namespace: default
Priority: 0
Node: ip-10-10-129-78.us-west-1.compute.internal/10.10.128.238
Start Time: Thu, 11 Jun 2020 16:09:21 +0000
Labels: controller-revision-hash=9f5c6789c
name=kube-netc
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 10.244.0.138
IPs: <none>
Controlled By: DaemonSet/kube-netc
Containers:
kube-netc:
Container ID: docker://2650292745d9226bc5fb185c03cdae315d3a72cfe5551a41d981bc1c696fee33
Image: nirmata/kube-netc
Image ID: docker-pullable://nirmata/kube-netc@sha256:6b754b4a759d7ca992a3b464d96613f1db3851d679fbbd63b63c23ced8714e21
Port: 2112/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 14 Jun 2020 21:07:34 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sun, 14 Jun 2020 19:07:26 +0000
Finished: Sun, 14 Jun 2020 21:07:32 +0000
Ready: True
Restart Count: 25
Environment: <none>
Mounts:
/sys/fs/bpf from bpf (rw)
/sys/fs/cgroup from cgroup (rw)
/sys/kernel/debug from debug (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-6dwlr (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
bpf:
Type: HostPath (bare host directory volume)
Path: /sys/fs/bpf
HostPathType:
cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
debug:
Type: HostPath (bare host directory volume)
Path: /sys/kernel/debug
HostPathType:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
default-token-6dwlr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-6dwlr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events: <none>
I haven't been able to observe any of these restarts myself. My kube-netc pod hasn't restarted. Would you be able to post the pod log?
Here is the pod data from a test system that shows multiple restarts:
ubuntu@ip-10-10-129-78:~$ kubectl describe pod 10 kube-netc-5hrkd
Name: kube-netc-5hrkd
Namespace: default
Priority: 0
Node: ip-10-10-129-78.us-west-1.compute.internal/10.10.128.238
Start Time: Thu, 11 Jun 2020 16:09:21 +0000
Labels: controller-revision-hash=9f5c6789c
name=kube-netc
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 10.244.0.138
IPs: <none>
Controlled By: DaemonSet/kube-netc
Containers:
kube-netc:
Container ID: docker://5f118d6cf3dc2d4a9837aabddbd7f8445b4ff8572408d23059f2f4ba68565926
Image: nirmata/kube-netc
Image ID: docker-pullable://nirmata/kube-netc@sha256:6b754b4a759d7ca992a3b464d96613f1db3851d679fbbd63b63c23ced8714e21
Port: 2112/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 19 Jun 2020 11:29:04 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 19 Jun 2020 07:57:23 +0000
Finished: Fri, 19 Jun 2020 11:29:02 +0000
Ready: True
Restart Count: 81
Environment: <none>
Mounts:
/sys/fs/bpf from bpf (rw)
/sys/fs/cgroup from cgroup (rw)
/sys/kernel/debug from debug (rw)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-6dwlr (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
bpf:
Type: HostPath (bare host directory volume)
Path: /sys/fs/bpf
HostPathType:
cgroup:
Type: HostPath (bare host directory volume)
Path: /sys/fs/cgroup
HostPathType:
debug:
Type: HostPath (bare host directory volume)
Path: /sys/kernel/debug
HostPathType:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
default-token-6dwlr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-6dwlr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events: <none>
Error from server (NotFound): pods "10" not found
Here are some more details from a test system:
[UPDATE] Pod nirmata-cni-installer-rppg9
Labels: map[controller-revision-hash:3974475895 k8s-app:nirmata-cni-installer pod-template-generation:1 tier:node]
IP: 10.10.128.238
Pod Update: 10.10.128.238 -> nirmata-cni-installer-rppg9
[UPDATE] Pod metrics-server-8457dfc7f5-lt7sd
Labels: map[k8s-app:metrics-server pod-template-hash:4013897391]
IP: 10.244.0.163
Pod Update: 10.244.0.163 -> metrics-server-8457dfc7f5-lt7sd
[UPDATE] Pod aws-node-ppsft
Labels: map[controller-revision-hash:4159804558 k8s-app:aws-node pod-template-generation:1]
IP: 10.10.128.238
fatal error: concurrent map read and map write
goroutine 12 [running]:
runtime.throw(0x186b967, 0x21)
/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc000072c38 sp=0xc000072c08 pc=0x434c92
runtime.mapaccess2_faststr(0x16258e0, 0xc00012a3c0, 0xc00fcbfd30, 0xc, 0xc000136658, 0xc00f760201)
/usr/local/go/src/runtime/map_faststr.go:116 +0x47c fp=0xc000072ca8 sp=0xc000072c38 pc=0x41413c
github.com/nirmata/kube-netc/pkg/collector.StartCollector(0x26730e0, 0xc0000b1300)
/build/pkg/collector/collector.go:29 +0x427 fp=0xc000072fd0 sp=0xc000072ca8 pc=0x14ba1d7
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000072fd8 sp=0xc000072fd0 pc=0x464c71
created by main.main
/build/main.go:28 +0xfa
goroutine 1 [IO wait, 60 minutes]:
internal/poll.runtime_pollWait(0x7f65bad4efc0, 0x72, 0x0)
/usr/local/go/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc000122098, 0x72, 0x0, 0x0, 0x183b6ec)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc000122080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:384 +0x1d4
net.(*netFD).accept(0xc000122080, 0xc000107d20, 0x26f3ee0, 0x7f65e1a56108)
/usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc0000bef20, 0xc000107d60, 0x40dde8, 0x30)
/usr/local/go/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xc0000bef20, 0x1744e40, 0xc00012b4d0, 0x16158e0, 0x2667000)
/usr/local/go/src/net/tcpsock.go:261 +0x64
net/http.(*Server).Serve(0xc00012c0e0, 0x1add8a0, 0xc0000bef20, 0x0, 0x0)
/usr/local/go/src/net/http/server.go:2901 +0x25d
net/http.(*Server).ListenAndServe(0xc00012c0e0, 0xc00012c0e0, 0xc000107f58)
/usr/local/go/src/net/http/server.go:2830 +0xb7
net/http.ListenAndServe(...)
/usr/local/go/src/net/http/server.go:3086
main.main()
/build/main.go:32 +0x1c7
goroutine 19 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x26f2d80)
/go/pkg/mod/k8s.io/[email protected]/klog.go:1010 +0x8b
created by k8s.io/klog.init.0
/go/pkg/mod/k8s.io/[email protected]/klog.go:411 +0xd6
github.com/patrickmn/go-cache.(*janitor).Run(0xc0002575e0, 0xc0002a7c00)
/go/pkg/mod/github.com/patrickmn/[email protected]+incompatible/cache.go:1079 +0xd5
created by github.com/patrickmn/go-cache.runJanitor
/go/pkg/mod/github.com/patrickmn/[email protected]+incompatible/cache.go:1099 +0xa9
goroutine 21 [sync.Cond.Wait, 60 minutes]:
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:310
sync.runtime_notifyListWait(0xc00032e250, 0x0)
/usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc00032e240)
/usr/local/go/src/sync/cond.go:56 +0x9d
github.com/cihub/seelog.(*asyncLoopLogger).processItem(0xc000120900, 0x0)
/go/pkg/mod/github.com/cihub/[email protected]/behavior_asynclooplogger.go:50 +0x8a
github.com/cihub/seelog.(*asyncLoopLogger).processQueue(0xc000120900)
/go/pkg/mod/github.com/cihub/[email protected]/behavior_asynclooplogger.go:63 +0x42
created by github.com/cihub/seelog.NewAsyncLoopLogger
/go/pkg/mod/github.com/cihub/[email protected]/behavior_asynclooplogger.go:40 +0x9c
goroutine 22 [sync.Cond.Wait, 60 minutes]:
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:310
sync.runtime_notifyListWait(0xc00032e3d0, 0x0)
/usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc00032e3c0)
/usr/local/go/src/sync/cond.go:56 +0x9d
github.com/cihub/seelog.(*asyncLoopLogger).processItem(0xc000120a20, 0x0)
/go/pkg/mod/github.com/cihub/[email protected]/behavior_asynclooplogger.go:50 +0x8a
github.com/cihub/seelog.(*asyncLoopLogger).processQueue(0xc000120a20)
/go/pkg/mod/github.com/cihub/[email protected]/behavior_asynclooplogger.go:63 +0x42
goroutine 10 [chan send]:
github.com/nirmata/kube-netc/pkg/tracker.(*Tracker).run(0x26730e0, 0x0, 0x464c76)
/build/pkg/tracker/tracker.go:236 +0xad5
github.com/nirmata/kube-netc/pkg/tracker.(*Tracker).StartTracker(0x26730e0)
/build/pkg/tracker/tracker.go:139 +0x35
created by main.main
/build/main.go:23 +0x45
goroutine 11 [sync.Cond.Wait]:
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:310
sync.runtime_notifyListWait(0xc0007b0028, 0x1db)
/usr/local/go/src/runtime/sema.go:513 +0xf8
sync.(*Cond).Wait(0xc0007b0018)
/usr/local/go/src/sync/cond.go:56 +0x9d
k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop(0xc0007b0000, 0xc00034f1d0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/delta_fifo.go:479 +0xa0
k8s.io/client-go/tools/cache.(*controller).processLoop(0xc00007aa00)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:173 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00001ddd0)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0004e5dd0, 0x1aa2dc0, 0xc000027b60, 0xc000044001, 0xc000492000)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xa3
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00001ddd0, 0x3b9aca00, 0x0, 0xc000490801, 0xc000492000)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90
k8s.io/client-go/tools/cache.(*controller).Run(0xc00007aa00, 0xc000492000)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:145 +0x2c4
k8s.io/client-go/tools/cache.(*sharedIndexInformer).Run(0xc000121b00, 0xc000492000)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:377 +0x401
github.com/nirmata/kube-netc/pkg/cluster.(*ClusterInfo).Run(0xc0000b1300)
/build/pkg/cluster/cluster.go:157 +0x29e
created by main.main
/build/main.go:26 +0xcc
goroutine 132 [select, 1 minutes]:
github.com/DataDog/datadog-agent/pkg/process/util.(*LogLimit).resetLoop(0xc0004f0340)
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/process/util/log_limit.go:54 +0xbc
created by github.com/DataDog/datadog-agent/pkg/process/util.NewLogLimit
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/process/util/log_limit.go:31 +0xb5
goroutine 131 [IO wait, 1 minutes, locked to thread]:
internal/poll.runtime_pollWait(0x7f65bad4ed20, 0x72, 0xc0013cbd98)
/usr/local/go/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc0000cccd8, 0x72, 0x418801, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).RawRead(0xc0000cccc0, 0xc001c5fb00, 0x0, 0x0)
/usr/local/go/src/internal/poll/fd_unix.go:534 +0xf6
os.(*rawConn).Read(0xc00055c0a0, 0xc001c5fb00, 0x7f65ba203c38, 0xc00045ec98)
/usr/local/go/src/os/rawconn.go:31 +0x65
github.com/DataDog/datadog-agent/pkg/network/netlink.(*Socket).recvmsg(0xc0000e4b40, 0xc000b12000, 0x8000, 0x8000, 0x0, 0xc000aa0000, 0x461620, 0xc000603380)
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/network/netlink/socket.go:223 +0xe0
github.com/DataDog/datadog-agent/pkg/network/netlink.(*Socket).ReceiveInto(0xc0000e4b40, 0xc000aa0000, 0x1000, 0x1000, 0xff0, 0x0, 0x0, 0x0, 0x0)
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/network/netlink/socket.go:117 +0x67
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/network/netlink/socket.go:117 +0x67
github.com/DataDog/datadog-agent/pkg/network/netlink.(*Consumer).receive(0xc000124070, 0xc00033a540)
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/network/netlink/consumer.go:270 +0x231
github.com/DataDog/datadog-agent/pkg/network/netlink.(*Consumer).Events.func1()
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/network/netlink/consumer.go:131 +0x86
github.com/DataDog/datadog-agent/pkg/network/netlink.(*Consumer).initWorker.func1.1()
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/network/netlink/consumer.go:196 +0x2d
github.com/DataDog/datadog-agent/pkg/process/util.WithRootNS(0x183855b, 0x5, 0xc0007b87a8, 0x0, 0x0)
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/process/util/netns.go:30 +0x1cc
github.com/DataDog/datadog-agent/pkg/network/netlink.(*Consumer).initWorker.func1(0x183855b, 0x5, 0xc000124070)
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/network/netlink/consumer.go:190 +0x5d
created by github.com/DataDog/datadog-agent/pkg/network/netlink.(*Consumer).initWorker
/go/pkg/mod/github.com/!data!dog/[email protected]/pkg/network/netlink/consumer.go:189 +0x53
goroutine 136 [select]:
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch.func2(0xc000484340, 0xc000492000, 0xc0000a6cc0, 0xc00033b3e0)
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:328 +0x14a
created by k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch
/go/pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:322 +0x276
I added a mutex to the ObjectIPMap that is present in the cluster package to make access to it threadsafe. So hopefully that should solve our restart problem. 8b9b402
I am not going to close the issue just yet until we know this fixed it but hopefully this is what does it.
I am still seeing this:
ubuntu@ip-10-10-129-44:~$ kubectl get pods -A | grep netc
kube-system kube-netc-nm5j7 1/1 Running 62 6d20h
Nothing unusual in the logs:
{"level":"warn","ts":"2020-07-23T07:22:13.735Z","msg":"extreme transfer rate","package":"tracker","source":"10.10.129.239:2181","direction":"sent","bps":70695954795}
{"level":"warn","ts":"2020-07-23T07:22:14.731Z","msg":"extreme transfer rate","package":"tracker","source":"10.10.131.251:44006","direction":"sent","bps":53136722965}
{"level":"warn","ts":"2020-07-23T07:22:14.734Z","msg":"extreme transfer rate","package":"tracker","source":"10.10.129.239:2181","direction":"sent","bps":82850415419}
{"level":"warn","ts":"2020-07-23T07:22:15.729Z","msg":"extreme transfer rate","package":"tracker","source":"10.10.129.239:2181","direction":"sent","bps":237007221801}
{"level":"warn","ts":"2020-07-23T07:22:15.735Z","msg":"extreme transfer rate","package":"tracker","source":"10.10.131.251:44006","direction":"recv","bps":66751592189}
{"level":"warn","ts":"2020-07-23T07:22:16.730Z","msg":"extreme transfer rate","package":"tracker","source":"10.10.129.239:2181","direction":"sent","bps":244325343580}
Occasionally there are these message in the logs:
E0723 07:22:13.699329 6 reflector.go:178] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1.Node:
E0723 07:22:30.491520 6 reflector.go:178] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1.Service: services is forbidden: User "system:serviceaccount:kube-system:default" cannot list resource "services" in API group "" at the cluster scope
E0723 07:22:37.956938 6 reflector.go:178] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:125: Failed to list *v1.Service: services is forbidden: User "system:serviceaccount:kube-system:default" cannot list resource "services" in API group "" at the cluster scope
Found this:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14dc502]
goroutine 66 [running]:
github.com/nirmata/kube-netc/pkg/collector.generateLabels(0xc000f02c60, 0xc, 0x20fb, 0xc000f02c50, 0xc, 0x0, 0x0, 0x0, 0x0, 0x1, ...)
/build/pkg/collector/collector.go:54 +0x572
github.com/nirmata/kube-netc/pkg/collector.StartCollector(0x26c6960, 0xc00000ee40, 0xc00050e0b0)
/build/pkg/collector/collector.go:76 +0x19c
created by main.main
/build/main.go:67 +0x2c1