pulsar-client-go icon indicating copy to clipboard operation
pulsar-client-go copied to clipboard

[BUG] Panics in SendRequestNoWait are triggered  when reconnecting consumer to broker

Open countryroadscn opened this issue 3 years ago • 1 comments

Expected behavior

Panics should not occur when the client reconnects

Actual behavior

Panic is triggered in pulsar and cannot be captured by the client code.

The backtrace of core dump: 0 0x0000000000479261 in runtime.raise at /data/goroot/go/src/runtime/sys_linux_amd64.s:168 1 0x000000000045adc5 in runtime.dieFromSignal at /data/goroot/go/src/runtime/signal_unix.go:852 2 0x000000000045b796 in runtime.sigfwdgo at /data/goroot/go/src/runtime/signal_unix.go:1066 3 0x0000000000459b27 in runtime.sigtrampgo at /data/goroot/go/src/runtime/signal_unix.go:430 4 0x000000000047a0ae in runtime.sigtrampgo at :1 5 0x000000000047955d in runtime.sigtramp at /data/goroot/go/src/runtime/sys_linux_amd64.s:361 6 0x00007f9041b19390 in ??? at ?:-1 7 0x0000000000444e49 in runtime.crash at /data/goroot/go/src/runtime/signal_unix.go:944 8 0x0000000000444e49 in runtime.fatalpanic at /data/goroot/go/src/runtime/panic.go:1092 9 0x0000000000444617 in runtime.gopanic at /data/goroot/go/src/runtime/panic.go:941 10 0x0000000000490994 in sync.(*WaitGroup).Add at /data/goroot/go/src/sync/waitgroup.go:94 11 0x0000000000490a05 in sync.(*WaitGroup).Done at /data/goroot/go/src/sync/waitgroup.go:105 12 0x0000000000fec866 in github.com/apache/pulsar-client-go/pulsar/internal.(*connection).SendRequestNoWait at /data/goroot/gopath/pkg/mod/github.com/apache/[email protected]/pulsar/internal/connection.go:606

截屏2022-10-11 15 36 42 截屏2022-10-11 15 36 59 截屏2022-10-11 15 37 08

Steps to reproduce

This issue may not reproduce every time, but it must be related to sync.WaitGroup.

System configuration

Pulsar version: 2.9 pulsar-client-go version: v0.6.0

countryroadscn avatar Oct 11 '22 08:10 countryroadscn

Thank you for reporting this. Although it's 0.6.0, the same logic exists in the current master.

The problem is basically WaitGroup does not allow Wait() and Add() to run concurrently. (Add() itself can run concurrently). This is clearly explained in the Go WaitGroup source code at https://github.com/golang/go/blob/master/src/sync/waitgroup.go#L30

// Add adds delta, which may be negative, to the WaitGroup counter.
// If the counter becomes zero, all goroutines blocked on Wait are released.
// If the counter goes negative, Add panics.
//
// Note that calls with a positive delta that occur when the counter is zero
// must happen before a Wait. Calls with a negative delta, or calls with a
// positive delta that start when the counter is greater than zero, may happen
// at any time.
// Typically this means the calls to Add should execute before the statement
// creating the goroutine or other event to be waited for.

The WaitGroup, incomingRequestsWG, in the connection.go waits all the sent requests before close the main run goroutine. The design is for run() wait for any requests to be completed if there is any. So both Wait() and Add() are in this concurrent manner. The code should preemptively Add(1) before Wait() is called in the closeCh triggered.

zzzming avatar Oct 12 '22 01:10 zzzming

Thank you for reporting this. Although it's 0.6.0, the same logic exists in the current master.

The problem is basically WaitGroup does not allow Wait() and Add() to run concurrently. (Add() itself can run concurrently). This is clearly explained in the Go WaitGroup source code at https://github.com/golang/go/blob/master/src/sync/waitgroup.go#L30

// Add adds delta, which may be negative, to the WaitGroup counter.
// If the counter becomes zero, all goroutines blocked on Wait are released.
// If the counter goes negative, Add panics.
//
// Note that calls with a positive delta that occur when the counter is zero
// must happen before a Wait. Calls with a negative delta, or calls with a
// positive delta that start when the counter is greater than zero, may happen
// at any time.
// Typically this means the calls to Add should execute before the statement
// creating the goroutine or other event to be waited for.

The WaitGroup, incomingRequestsWG, in the connection.go waits all the sent requests before close the main run goroutine. The design is for run() wait for any requests to be completed if there is any. So both Wait() and Add() are in this concurrent manner. The code should preemptively Add(1) before Wait() is called in the closeCh triggered.

What time can this pr be merged?

countryroadscn avatar Nov 21 '22 02:11 countryroadscn