otelconnect-go icon indicating copy to clipboard operation
otelconnect-go copied to clipboard

Race Condition in otelconnect v0.7.2 with Streaming Endpoints

Open gcemaj opened this issue 2 months ago • 3 comments

Race Condition in otelconnect v0.7.2 with Streaming Endpoints

Summary

The otelconnect library (v0.7.2) has a data race when handling streaming RPC endpoints. The race occurs between attribute creation/sorting and attribute copying operations in concurrent goroutines.

Environment

  • connectrpc.com/otelconnect v0.7.2
  • go.opentelemetry.io/otel v1.37.0
  • Go 1.24.0 (also reproducible on 1.23.x)

Race Condition Details

The race detector reports the following:

Goroutine 1 (Creating/Sorting attributes):

  • slices.SortStableFunc in go.opentelemetry.io/otel/attribute.NewSet() (attribute/set.go:212)
  • Called from otelconnect.(*Interceptor).WrapStreamingHandler (interceptor.go:332)

Goroutine 2 (Copying attributes):

  • runtime.slicecopy in go.opentelemetry.io/otel/metric.WithAttributes() (instrument.go:374)
  • Called from otelconnect.(*streamingState).receive() (streaming.go:98)
  • Called from otelconnect.(*Interceptor).WrapStreamingHandler.func1.1 (interceptor.go:319)

Stack Trace Snippet

  WARNING: DATA RACE
  Write at 0x00c000cfb0c0 by goroutine 1776:
    slices.insertionSortCmpFunc[...]()
    slices.SortStableFunc[...]()
    go.opentelemetry.io/otel/attribute.NewSet()
        attribute/set.go:212 +0x57
    connectrpc.com/otelconnect.(*Interceptor).WrapStreamingHandler.func1()
        interceptor.go:332 +0x1863

  Previous read at 0x00c000cfb0c0 by goroutine 1783:
    runtime.slicecopy()
    go.opentelemetry.io/otel/metric.WithAttributes()
        instrument.go:374 +0x5a
    connectrpc.com/otelconnect.(*streamingState).receive()
        streaming.go:98 +0x72e

Root Cause

The interceptor appears to be sharing attribute slices between the main streaming handler goroutine and the receive goroutine without proper synchronization. When NewSet() sorts the attributes in one goroutine while WithAttributes() is copying them in another, a data race occurs.

Reproduction

The race is consistently reproducible when:

  1. Using bidirectional streaming endpoints (BidiStream)
  2. Having concurrent message sending/receiving
  3. Running with Go's race detector (go test -race)

Workaround

We've temporarily replaced otelconnect with a custom interceptor that ensures each goroutine gets its own copy of attributes:

  // Create a fresh set of attributes for each call
  // This avoids the race condition where attributes are shared between goroutines
  return attribute.NewSet(
      attribute.String("rpc.method", procedure),
      attribute.String("rpc.system", "connect"),
      attribute.String("rpc.service", serviceName),
  )

gcemaj avatar Sep 19 '25 16:09 gcemaj

Hi @gcemaj thanks for the report. Would you be able to expand on your use case here with a testcase example?

From looking at the race output I can see the race on the Receive happening concurrently with end of stream event.

https://github.com/connectrpc/otelconnect-go/blob/3379c6d78731640c5424f4616b0968c887898794/interceptor.go#L332

https://github.com/connectrpc/otelconnect-go/blob/3379c6d78731640c5424f4616b0968c887898794/streaming.go#L98

I don't believe this should occur as any Receive or Send after the handler has been returned could also result in a race in the underlying HTTP connection. The StreamingHandlerConn implementations are not safe for concurrent use.

emcfarlane avatar Sep 22 '25 16:09 emcfarlane

I see in the documentation that StreamingHandlerConn implementations do not need to be safe for concurrent use. Which made me think it could be safe for concurrent use. Is the expectation that Send and Receive are called from the same go routine?

gcemaj avatar Sep 26 '25 15:09 gcemaj

@gcemaj, I think we need to considerably revise those Go docs. The thread-safety of StreamingHandlerConn is a little more complicated than those docs imply.

It is in fact safe to call Send and Receive from different goroutines. It is not safe, however, for multiple goroutines to all call Send or for multiple goroutines to all call Receive. This is actually a necessary property for bidirectional endpoints (to be able to send and receive concurrently) since HTTP/2 flow control could otherwise cause deadlock if both sides are trying to Send at the same simultaneously.

The connect-go module's impl of StreamingHandlerConn is safe to use in this way. I think perhaps that remark, that implementations do not need to be thread-safe, was meant for application implementations of this interface, since the framework will never invoke your impl from multiple goroutines. Your app can end up implementing this interface if you implement streaming interceptors, where creating custom impls that wrap/decorate the framework's StreamingHandlerConn imp is common.

While the Go docs for StreamingClientConn are better (more nuanced and precise), both could use some TLC to improve comprehensibility.

jhump avatar Oct 07 '25 18:10 jhump