gobench
gobench copied to clipboard
[panic]: another panic in grpc#2371
As described in backtrace, the statement case <-ccb.resolverUpdateCh: causes a panic when the statement cc.resolverWrapper = nil happens before it. Does this also mean that the statement ccb.resolverUpdateCh <- struct{}{} will cause a panic with the statement cc.resolverWrapper = nil?
In addition, the func chanrecv() is manifested as write,set nil is manifested as write. Does it mean that the two statements (case <-ccb.resolverUpdateCh: and cc.resolverWrapper = nil) can cause a data race at the underlying code level. The func chansend() is manifested as read,set nil is manifested as write. Does it mean that the two statements (ccb.resolverUpdateCh <- struct{}{} and cc.resolverWrapper = nil) can cause a data race at the underlying code level.
If only in terms of source code, the scenario you're talking about can happen; However, Go's own memory model limits the order of send/recv, so not all situations actually happen. If you could provide real backtraces on those new bugs, I think it would make more sense.
part 1:
14 func (ccb *ccBalancerWrapper) handleResolvedAddrs() {
15 select {
16 case <-ccb.resolverUpdateCh:
17 default:
18 }
19 ccb.resolverUpdateCh <- struct{}{}
20 }
part 2:
61 func (cc *ClientConn) Close() {
62 cc.mu.Lock()
63 cc.resolverWrapper = nil
64 cc.balancerWrapper = nil
65 cc.mu.Unlock()
66 }
Backtrace:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5b43b4]
goroutine 7 [running]:
command-line-arguments.(*ccBalancerWrapper).handleResolvedAddrs(0x0)
/root/gobench/goker/nonblocking/grpc/2371/grpc2371_test.go:16 +0x34
command-line-arguments.(*ClientConn).handleServiceConfig(0xc00001c300)
/root/gobench/goker/nonblocking/grpc/2371/grpc2371_test.go:57 +0x5f
command-line-arguments.(*ccResolverWrapper).watcher(0xc000010038)
/root/gobench/goker/nonblocking/grpc/2371/grpc2371_test.go:39 +0x4c
created by command-line-arguments.(*ccResolverWrapper).start
/root/gobench/goker/nonblocking/grpc/2371/grpc2371_test.go:35 +0x4d
According to the backtrace, since the statement cc.resolverWrapper = nil(Line 16) happens after the statement case <-ccb.resolverUpdateCh:(Line 63), the panic can be triggered.
Similarly, since the func handleResolvedAddrs() and func Close() are called in parallel, the statement cc.resolverWrapper = nil may happen before the statement case <-ccb.resolverUpdateCh:, but the statement ccb.resolverUpdateCh <- struct{}{}(Line 19) happens after the statement case <-ccb.resolverUpdateCh:(Line 63), it also can cause a panic. This seems to have nothing to do with the order of send/recv.
Upon reviewing the code again, I agree with your observation that if the select (line 15) chooses the default case and another goroutine assigns nil to resolverWrapper at the same time, it could potentially trigger a panic at line 19. However, I would like to point out that the probability of panic occurring on line 19 is relatively low compared to line 16 in practice. It's possible that the complexity of the test case makes it difficult for the goroutine to be scheduled in this manner.