jaeger
jaeger copied to clipboard
Flaky test TestGRPCResolverRoundRobin (again)
This is a repeat of #2085 and #2674.
TestGRPCResolverRoundRobin fails 100% of the time on my Mac (macOS Monterey 12.2.1) when run using time go test -tags=memory_storage_integration ./pkg/discovery/...
.
The test is flaky in two ways.
I was able to cure both problems, but I don't understand my fixes, and thus am not confident enough to create a PR around them (yet).
Rarely the test fails with a panic on addrs[p.Addr.String()] = struct{}{}
. This is likely because p.Addr is set with grpc.Peer()
, which populates the peer field after the RPC completes based on the Context
. We are using a non-blocking Client, and re-using the same Context for each connection. Something is racing in there. Wrapping the statement with if p.Addr != nil {
removes the occasional panic, but I am unsure if it is safe to use grpc.Peer()
at all if work might be done on another thread.
I noticed makeSureConnectionsUp()
has the comment // 3000 * 10ms = 30s
but the test fails in under 2s. Adding time.Sleep(10 * time.Millisecond)
before continue
to make the comment true fixes the test. (The problem also goes away if I raise 3000 to 2000.)