opentelemetry-go icon indicating copy to clipboard operation
opentelemetry-go copied to clipboard

Panic in otlptrace triggered by integration test

Open titpetric opened this issue 1 year ago • 3 comments

I'm getting the following panic (in CI, but not on local):

tyk-1             | panic: runtime error: hash of unhashable type [2]string
tyk-1             | 
tyk-1             | goroutine 54 [running]:
tyk-1             | go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal/tracetransform.Spans({0xc00020af08, 0x4, 0xc000e6a080?})
tyk-1             | 	go.opentelemetry.io/otel/exporters/otlp/[email protected]/internal/tracetransform/span.go:41 +0x2d9
tyk-1             | go.opentelemetry.io/otel/exporters/otlp/otlptrace.(*Exporter).ExportSpans(0xc000304370, {0x404dc18, 0xc0002dc0e0}, {0xc00020af08?, 0xc00008eef2?, 0xc0002936c0?})
tyk-1             | 	go.opentelemetry.io/otel/exporters/otlp/[email protected]/exporter.go:31 +0x34
tyk-1             | go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).exportSpans(0xc00031c140, {0x404dba8, 0xc00017c6e0})
tyk-1             | 	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:277 +0x238
tyk-1             | go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).processQueue(0xc00031c140)
tyk-1             | 	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:305 +0x36e
tyk-1             | go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor.func1()
tyk-1             | 	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:117 +0x54
tyk-1             | created by go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor in goroutine 1
tyk-1             | 	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:115 +0x2e5
tyk-1 exited with code 2

Looking at the code, this should not be possible. We are using go 1.22.3 to build the project, and running the same test locally doesn't trigger the panic. I'm still investigating the issue, but if anyone has any ideas, I'm open for advice. This uses otel collector 0.100.0, and 1.26.0 release of otel, otel/trace (as seen in panic output).

titpetric avatar May 15 '24 10:05 titpetric

@titpetric Could you describe how one might reproduce the issue?

Cirilla-zmh avatar May 15 '24 15:05 Cirilla-zmh

Currently we have two PRs that are trying to replicate this, however we are not able to replicate it locally (still working on using the actual image in the CI test due to access control). It's failing in GH actions right now, and it seems to be caused by the go upgrade (same test suite passes on 1.21.x). I wish I had more info, other than that the trace from the panic seems fully invalid, the code in question is using a struct{} with 2 fields and have no idea where [2]string may be coming from.

https://github.com/TykTechnologies/tyk/actions/runs/9098478917/job/25009198049?pr=6269

I'll post any updates.

titpetric avatar May 15 '24 15:05 titpetric

Replicated the panic on local with the ECR image, continuing investigation.

The second PR without an otel update produces a similar panic, but with v1.18.0.

panic: runtime error: hash of unhashable type [2]string

goroutine 48 [running]:
go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal/tracetransform.Spans({0xc0004ea508, 0x4, 0xc000f06540?})
	go.opentelemetry.io/otel/exporters/otlp/[email protected]/internal/tracetransform/span.go:52 +0x2d9
go.opentelemetry.io/otel/exporters/otlp/otlptrace.(*Exporter).ExportSpans(0xc0002ea4b0, {0x4022b18, 0xc00030c2a0}, {0xc0004ea508?, 0xc000f15ef2?, 0xc00029e1c0?})
	go.opentelemetry.io/otel/exporters/otlp/[email protected]/exporter.go:44 +0x34
go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).exportSpans(0xc0005be0a0, {0x4022aa8, 0xc0002eab40})
	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:288 +0x238
go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).processQueue(0xc0005be0a0)
	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:316 +0x38d
go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor.func1()
	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:128 +0x54
created by go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor in goroutine 1
	go.opentelemetry.io/otel/[email protected]/trace/batch_span_processor.go:126 +0x2e5

Things are pointing to some particular build issue with Go 1.22.3 in the CI environment, build from source doesn't experience this issue even with Go 1.22.3, so it's likely related to our CI cross build environment which is different (build with goreleaser, -X cflags, tags, trimpath,...)

titpetric avatar May 15 '24 17:05 titpetric

It looks like the go issue was fixed. Is this issue still valid? https://github.com/golang/go/issues/65957

dmathieu avatar Nov 27 '24 09:11 dmathieu