google-cloud-go icon indicating copy to clipboard operation
google-cloud-go copied to clipboard

spanner: panic in recordGFELatencyMetricsOT if EnableOpenTelemetryMetrics called after NewClient

Open henripqt opened this issue 1 year ago • 1 comments

Client

Spanner

Environment

e.g. Alpine Docker on GKE

Go Environment

go1.22.0

Code

When enabling open telemetry after creating the client the configuration set on the client is invalid resulting in potential panic.

e.g.

package main

func main() {
   // ... 
   	client, err := gcspanner.NewClient(
		ctx,
		db,
		opts...,
	)

	if err != nil {
		panic(err)
	}

	initOnce.Do(func() {
		if os.Getenv("SPANNER_EMULATOR_HOST") == "" {
			spanner.EnableOpenTelemetryMetrics()
		}
	})
	
	// ...
}

In the above example the new client created would have a default openTelemetryConfig with all instruments being nil possibly leading to a panic during a call to recordGFELatencyMetricsOT (see bellow with added comments)


func recordGFELatencyMetricsOT(ctx context.Context, md metadata.MD, keyMethod string, otConfig *openTelemetryConfig) error {
	if !IsOpenTelemetryMetricsEnabled() || md == nil && otConfig == nil { // the global flag is enable thus we don't return nil
		return nil
	}
	attr := otConfig.attributeMap
	if len(md.Get("server-timing")) == 0 && otConfig.gfeHeaderMissingCount != nil { // at this point otConfig.gfeHeaderMissingCount is nil
		otConfig.gfeHeaderMissingCount.Add(ctx, 1, metric.WithAttributes(attr...))
		return nil
	}
	serverTiming := md.Get("server-timing")[0] // This will panic !
	gfeLatency, err := strconv.Atoi(strings.TrimPrefix(serverTiming, "gfet4t7; dur="))
	if !strings.HasPrefix(serverTiming, "gfet4t7; dur=") || err != nil {
		return err
	}
	attr = append(attr, attributeKeyMethod.String(keyMethod))
	if otConfig.gfeLatency != nil {
		otConfig.gfeLatency.Record(ctx, int64(gfeLatency), metric.WithAttributes(attr...))
	}
	return nil
}

Expected behavior

  • The EnableOpenTelemetryMetrics function should mention that it needs to be registered prior to enabling the client
  • recordGFELatencyMetricsOT should have appropriate checks to handle reading from metadata while avoiding panics

Actual behavior

  • EnableOpenTelemetryMetrics doens't mention that it should be called prior to instantiating a new client
  • recordGFELatencyMetricsOT panics in some specific conditions

Additional context

e.g. Started after upgrading to v1.57.0 and using the EnableOpenTelemetryMetrics function as shown above

henripqt avatar Mar 06 '24 16:03 henripqt

Thanks @henripqt Its known and expected that sometimes response metadata header is missing, and below lines will fail if that's the case

if len(md.Get("server-timing")) == 0 && otConfig.gfeHeaderMissingCount != nil { // at this point otConfig.gfeHeaderMissingCount is nil
		otConfig.gfeHeaderMissingCount.Add(ctx, 1, metric.WithAttributes(attr...))
		return nil
	}

I have raised a PR which will initialize all the configs related to open telemetry by default irrespective of whether the Opentelemetry is enabled or not that will take care of this scenario

Thanks

rahul2393 avatar Mar 28 '24 06:03 rahul2393