tcptracer-bpf icon indicating copy to clipboard operation
tcptracer-bpf copied to clipboard

offsetguess got stuck

Open bboreham opened this issue 7 years ago • 2 comments

I have a program stuck with tryCurrentOffset() trying to send to the stop chan but acceptV4() is not receiving on the chan because it is waiting to accept:

goroutine 7166061 [IO wait, 1322 minutes]:
internal/poll.runtime_pollWait(0x7f14e842ab28, 0x72, 0x0)
        /usr/local/go/src/runtime/netpoll.go:173 +0x57
internal/poll.(*pollDesc).wait(0xc4211eff18, 0x72, 0xc42199e000, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9b
internal/poll.(*pollDesc).waitRead(0xc4211eff18, 0xffffffffffffff00, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc4211eff00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:372 +0x1a8
net.(*netFD).accept(0xc4211eff00, 0xc42127bd98, 0xc421316138, 0xc420d3beb8)
        /usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc42000e4c8, 0x40501b, 0xc420d3be60, 0xc422160fb8)
        /usr/local/go/src/net/tcpsock_posix.go:136 +0x2e
net.(*TCPListener).Accept(0xc42000e4c8, 0xc422160fb8, 0x1, 0x0, 0x0)
        /usr/local/go/src/net/tcpsock.go:259 +0x49
github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer.acceptV4(0x1cd47e0, 0xc42000e4c8, 0xc420d3be60)
        /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/offsetguess.go:114 +0x62
created by github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer.startServer
        /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/offsetguess.go:105 +0x1dc

goroutine 7166028 [chan send, 1322 minutes, locked to thread]:
github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer.tryCurrentOffset(0xc42322a1c0, 0xc42251bef0, 0xc421f22c60, 0xc422b16814, 0xc420d3be60, 0x0, 0x0)
        /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/offsetguess.go:190 +0x168
github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer.guess(0xc42322a1c0, 0x0, 0x0)
        /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/offsetguess.go:393 +0x3bc
github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer.initialize(0xc42322a1c0, 0x197fb6e, 0xe, 0xc420d3bc20, 0xc420d3bd40, 0xc420d3be00, 0x1662d60, 0xffffffffffffffff)
        /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/tracer.go:153 +0x40
github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer.initializeIPv4(0xc42322a1c0, 0xc420d3bc20, 0xc420d3bd40, 0x0, 0xc422b16b40, 0x10000)
        /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/tracer.go:167 +0x54
github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer.NewTracer(0x1cd6660, 0xc4200a6e60, 0xc4200a6e60, 0x0, 0x0)
        /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/tracer.go:63 +0x438
github.com/weaveworks/scope/probe/endpoint.(*EbpfTracker).restart(0xc4200a6e60, 0x0, 0x0)
        /go/src/github.com/weaveworks/scope/probe/endpoint/ebpf.go:378 +0x104
github.com/weaveworks/scope/probe/endpoint.(*connectionTracker).ReportConnections(0xc420730060, 0xc421781800)
        /go/src/github.com/weaveworks/scope/probe/endpoint/connection_tracker.go:108 +0x2ec
github.com/weaveworks/scope/probe/endpoint.(*Reporter).Report(0xc420730000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/weaveworks/scope/probe/endpoint/reporter.go:98 +0x11f
github.com/weaveworks/scope/probe.(*Probe).report.func1(0xc4200b3ae0, 0xc4222faf60, 0x1cbcae0, 0xc420730000)
        /go/src/github.com/weaveworks/scope/probe/probe.go:161 +0x122
created by github.com/weaveworks/scope/probe.(*Probe).report
        /go/src/github.com/weaveworks/scope/probe/probe.go:158 +0xf3

I can't see how it can get into this state. Maybe if something else can accept the socket connection?

bboreham avatar Sep 18 '18 11:09 bboreham

More detail: the log lines before it got stuck are:

<probe> ERRO: 2018/09/17 12:45:26.367111 tcp tracer received event with timestamp 617121384602265 even though the last timestamp was 617121384603240. Stopping the eBPF tracker.
<probe> WARN: 2018/09/17 12:45:27.379457 ebpf tracker died, restarting it

so there's no evidence of an error in tryCurrentOffset()

bboreham avatar Sep 18 '18 13:09 bboreham

Saw another occurrence yesterday - identical symptoms. I guess we should have a deadline on the Accept().

2020-06-11T17:32:34.940290904Z goroutine 4715516 [IO wait, 20153 minutes]:
2020-06-11T17:32:34.940297119Z internal/poll.runtime_pollWait(0x7f6bec527500, 0x72, 0x0)
2020-06-11T17:32:34.940301684Z  /usr/local/go/src/runtime/netpoll.go:184 +0x55
2020-06-11T17:32:34.940305628Z internal/poll.(*pollDesc).wait(0xc002d84018, 0x72, 0x0, 0x0, 0x21a8ee0)
2020-06-11T17:32:34.940309431Z  /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
2020-06-11T17:32:34.94031958Z internal/poll.(*pollDesc).waitRead(...)
2020-06-11T17:32:34.940324037Z  /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
2020-06-11T17:32:34.940328392Z internal/poll.(*FD).Accept(0xc002d84000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
2020-06-11T17:32:34.940332733Z  /usr/local/go/src/internal/poll/fd_unix.go:384 +0x1f8
2020-06-11T17:32:34.940342872Z net.(*netFD).accept(0xc002d84000, 0xc0022b7e00, 0xc0022b7e88, 0xc0000c1738)
2020-06-11T17:32:34.94034785Z   /usr/local/go/src/net/fd_unix.go:238 +0x42
2020-06-11T17:32:34.940352605Z net.(*TCPListener).accept(0xc000964380, 0x4079eb, 0xc0037a71a0, 0xc002c36fb8)
2020-06-11T17:32:34.940356757Z  /usr/local/go/src/net/tcpsock_posix.go:139 +0x32
2020-06-11T17:32:34.940360655Z net.(*TCPListener).Accept(0xc000964380, 0xc002c36fb8, 0x1, 0x0, 0x0)
2020-06-11T17:32:34.940364855Z  /usr/local/go/src/net/tcpsock.go:261 +0x47
2020-06-11T17:32:34.940368848Z github.com/weaveworks/tcptracer-bpf/pkg/tracer.acceptV4(0x28b87e0, 0xc000964380, 0xc0037a71a0)
2020-06-11T17:32:34.940373483Z  /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/offsetguess.go:114 +0x60
2020-06-11T17:32:34.940378314Z created by github.com/weaveworks/tcptracer-bpf/pkg/tracer.startServer
2020-06-11T17:32:34.940394891Z  /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/offsetguess.go:105 +0x1ee

2020-06-11T17:32:34.982561784Z goroutine 4715424 [chan send, 20153 minutes, locked to thread]:
2020-06-11T17:32:34.982571331Z github.com/weaveworks/tcptracer-bpf/pkg/tracer.tryCurrentOffset(0xc00197ce70, 0xc000b5fd10, 0xc0021062d0, 0xc002591e64, 0xc0037a71a0, 0x0, 0x0)
2020-06-11T17:32:34.982576698Z  /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/offsetguess.go:190 +0x15b
2020-06-11T17:32:34.982581971Z github.com/weaveworks/tcptracer-bpf/pkg/tracer.guess(0xc00197ce70, 0x0, 0x0)
2020-06-11T17:32:34.982591938Z  /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/offsetguess.go:393 +0x425
2020-06-11T17:32:34.982597255Z github.com/weaveworks/tcptracer-bpf/pkg/tracer.initialize(0xc00197ce70, 0x21c1c6f, 0xe, 0xc0037a7020, 0xc0037a70e0, 0x0, 0x0, 0x8635aa)
2020-06-11T17:32:34.982613023Z  /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/tracer.go:153 +0x40
2020-06-11T17:32:34.982618424Z github.com/weaveworks/tcptracer-bpf/pkg/tracer.initializeIPv4(...)
2020-06-11T17:32:34.982623007Z  /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/tracer.go:167
2020-06-11T17:32:34.982628219Z github.com/weaveworks/tcptracer-bpf/pkg/tracer.NewTracer(0x28c8360, 0xc0009e64b0, 0xc0025922d0, 0x865fa5, 0xc0001c6040)
2020-06-11T17:32:34.982648607Z  /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/tcptracer-bpf/pkg/tracer/tracer.go:63 +0x48e
2020-06-11T17:32:34.982659567Z github.com/weaveworks/scope/probe/endpoint.(*EbpfTracker).restart(0xc0009e64b0, 0x0, 0x0)
2020-06-11T17:32:34.982665814Z  /go/src/github.com/weaveworks/scope/probe/endpoint/ebpf.go:410 +0xf2
2020-06-11T17:32:34.982670531Z github.com/weaveworks/scope/probe/endpoint.(*connectionTracker).ReportConnections(0xc0001dc180, 0xc0016af500)
2020-06-11T17:32:34.98267553Z   /go/src/github.com/weaveworks/scope/probe/endpoint/connection_tracker.go:87 +0x32b
2020-06-11T17:32:34.98268025Z github.com/weaveworks/scope/probe/endpoint.(*Reporter).Report(0xc0001dc120, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
2020-06-11T17:32:34.982686015Z  /go/src/github.com/weaveworks/scope/probe/endpoint/reporter_linux.go:42 +0xc0
2020-06-11T17:32:34.982690233Z github.com/weaveworks/scope/probe.(*Probe).report.func1(0xc0002828f0, 0xc002e25e00, 0x288aee0, 0xc0001dc120)
2020-06-11T17:32:34.982700072Z  /go/src/github.com/weaveworks/scope/probe/probe.go:172 +0x10d
2020-06-11T17:32:34.982705564Z created by github.com/weaveworks/scope/probe.(*Probe).report
2020-06-11T17:32:34.982715701Z  /go/src/github.com/weaveworks/scope/probe/probe.go:169 +0x10c

bboreham avatar Jun 12 '20 09:06 bboreham