jaeger icon indicating copy to clipboard operation
jaeger copied to clipboard

Flaky test TestGRPCResolverRoundRobin (again)

Open esnible opened this issue 2 years ago • 0 comments

This is a repeat of #2085 and #2674.

TestGRPCResolverRoundRobin fails 100% of the time on my Mac (macOS Monterey 12.2.1) when run using time go test -tags=memory_storage_integration ./pkg/discovery/....

The test is flaky in two ways.

I was able to cure both problems, but I don't understand my fixes, and thus am not confident enough to create a PR around them (yet).

Rarely the test fails with a panic on addrs[p.Addr.String()] = struct{}{}. This is likely because p.Addr is set with grpc.Peer(), which populates the peer field after the RPC completes based on the Context. We are using a non-blocking Client, and re-using the same Context for each connection. Something is racing in there. Wrapping the statement with if p.Addr != nil { removes the occasional panic, but I am unsure if it is safe to use grpc.Peer() at all if work might be done on another thread.

I noticed makeSureConnectionsUp() has the comment // 3000 * 10ms = 30s but the test fails in under 2s. Adding time.Sleep(10 * time.Millisecond) before continue to make the comment true fixes the test. (The problem also goes away if I raise 3000 to 2000.)

esnible avatar Feb 24 '22 05:02 esnible