kuma
kuma copied to clipboard
Otel Exporter panics after a few minutes, complaining about invalid metrics
What happened?
Not sure if we are faulty or if it's the Datadog exporter/mapping for otel.
panic: runtime error: index out of range [0] with length 0
goroutine 450 [running]:
github.com/DataDog/opentelemetry-mapping-go/pkg/quantile.(*Agent).InsertInterpolate(0xc001deaf58, 0x414b774000000000, 0x3fe0000000000000, 0x0)
github.com/DataDog/opentelemetry-mapping-go/pkg/[email protected]/agent.go:94 +0x4b4
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).getSketchBuckets(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x7dc81df15470, 0xc001d2e540}, 0xc0020af5c0, {0xc003420c60?, 0xc00206a240?}, {0x0, 0x0, ...}, ...)
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/[email protected]/metrics_translator.go:351 +0xaf5
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapHistogramMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x90fc310, 0xc001d2e540}, 0x5b3a2273746e696f?, {0xc002149580?, 0xc00206a240?}, 0x0)
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/[email protected]/metrics_translator.go:515 +0x7c7
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapToDDFormat(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0024b2640?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?}, {0xc001bc6580, 0x1, 0x4}, ...)
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/[email protected]/metrics_translator.go:847 +0xabe
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).MapMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?})
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/[email protected]/metrics_translator.go:797 +0xd27
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsData(0xc002afea20, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?})
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/metrics_exporter.go:212 +0x21d
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsDataScrubbed(0xc002afea20, {0x911ee78?, 0xc002e9d7a0?}, {0xc0031ae000?, 0xc00206a240?})
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/metrics_exporter.go:185 +0x2c
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x0?, {0x911ee78?, 0xc002e9d7a0?})
go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:59 +0x31
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc001bdd980?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
go.opentelemetry.io/collector/[email protected]/exporterhelper/timeout_sender.go:43 +0x48
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc00280e8c0?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
go.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:35 +0x30
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc002d8c690, {0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:171 +0x7e
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1({0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:95 +0x84
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume(0x912a020, 0xc002d8c6f0)
go.opentelemetry.io/collector/[email protected]/internal/queue/bounded_memory_queue.go:57 +0xc7
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1()
go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43 +0x79
created by go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start in goroutine 1
go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:39 +0x7d
Repro / setup:
kubectl --context $CTX_CLUSTER3 create namespace observability
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
# otel collector config via helm
cat > otel-config-datadog.yaml <<EOF
mode: deployment
config:
exporters:
datadog:
api:
site: datadoghq.eu
key: <key>
service:
pipelines:
logs:
exporters:
- datadog
traces:
exporters:
- datadog
metrics:
exporters:
- datadog
EOF
helm upgrade --install \
--kube-context ${CTX_CLUSTER3} \
-n observability \
--set mode=deployment \
-f otel-config-datadog.yaml \
opentelemetry-collector open-telemetry/opentelemetry-collector
# enable Metrics
kumactl apply -f - <<EOF
type: MeshMetric
name: metrics-default
mesh: default
spec:
targetRef:
kind: Mesh
# applications:
# - name: "backend"
default:
backends:
- type: OpenTelemetry
openTelemetry:
endpoint: "opentelemetry-collector.observability.svc:4317"
EOF
Original author @bcollard
We can add debug exporter example:
metrics:
exporters:
- datadog
- debug
This will log all collected metrics, so we could find metrics on which datadog exporter fails, and create issue in OpenTelemetry collector. @bcollard could you look at it?
otel-exporter-2.log otel-exporter-1.log
Otel-collector keeps crashing with the debug exporter for metrics.
I see that I forgot about rest of the debug exporter config. @bcollard can you run this again with this config:
mode: deployment
config:
exporters:
debug:
verbosity: detailed
datadog:
api:
site: datadoghq.eu
key: <key>
service:
pipelines:
logs:
exporters:
- datadog
traces:
exporters:
- datadog
metrics:
exporters:
- datadog
- debug
This should properly log collected metrics co we can debug further
here attached otel-cluster1.log otel-cluster2.log
"kuma" appears a lot in the otel-cluster1.log
file, not in the other.
Logs look fine, but we could also verify if this is only datadog collector issue by pushing metrics to some other saas product like grafana and check if this issue is still there. There is an example on how to set this up in demo-scene repo. Cold out try this without datadog exporter @bcollard ?
Removing closed state labels due to the issue being reopened.
This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.