kuma icon indicating copy to clipboard operation
kuma copied to clipboard

Otel Exporter panics after a few minutes, complaining about invalid metrics

Open slonka opened this issue 1 year ago • 9 comments

What happened?

Not sure if we are faulty or if it's the Datadog exporter/mapping for otel.

panic: runtime error: index out of range [0] with length 0

goroutine 450 [running]:
github.com/DataDog/opentelemetry-mapping-go/pkg/quantile.(*Agent).InsertInterpolate(0xc001deaf58, 0x414b774000000000, 0x3fe0000000000000, 0x0)
	github.com/DataDog/opentelemetry-mapping-go/pkg/[email protected]/agent.go:94 +0x4b4
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).getSketchBuckets(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x7dc81df15470, 0xc001d2e540}, 0xc0020af5c0, {0xc003420c60?, 0xc00206a240?}, {0x0, 0x0, ...}, ...)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/[email protected]/metrics_translator.go:351 +0xaf5
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapHistogramMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0x90fc310, 0xc001d2e540}, 0x5b3a2273746e696f?, {0xc002149580?, 0xc00206a240?}, 0x0)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/[email protected]/metrics_translator.go:515 +0x7c7
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).mapToDDFormat(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0024b2640?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?}, {0xc001bc6580, 0x1, 0x4}, ...)
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/[email protected]/metrics_translator.go:847 +0xabe
github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/metrics.(*Translator).MapMetrics(0xc002aefb90, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?}, {0x90fc310?, 0xc001d2e540?})
	github.com/DataDog/opentelemetry-mapping-go/pkg/otlp/[email protected]/metrics_translator.go:797 +0xd27
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsData(0xc002afea20, {0x911ee78, 0xc002e9d7a0}, {0xc0031ae000?, 0xc00206a240?})
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/metrics_exporter.go:212 +0x21d
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/datadogexporter.(*metricsExporter).PushMetricsDataScrubbed(0xc002afea20, {0x911ee78?, 0xc002e9d7a0?}, {0xc0031ae000?, 0xc00206a240?})
	github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/metrics_exporter.go:185 +0x2c
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsRequest).Export(0x0?, {0x911ee78?, 0xc002e9d7a0?})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:59 +0x31
go.opentelemetry.io/collector/exporter/exporterhelper.(*timeoutSender).send(0xc001bdd980?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/timeout_sender.go:43 +0x48
go.opentelemetry.io/collector/exporter/exporterhelper.(*baseRequestSender).send(0xc00280e8c0?, {0x911ee78?, 0xc002e9d7a0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/common.go:35 +0x30
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send(0xc002d8c690, {0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/metrics.go:171 +0x7e
go.opentelemetry.io/collector/exporter/exporterhelper.newQueueSender.func1({0x911f350?, 0xc002879af0?}, {0x90d5d50?, 0xc0034429f0?})
	go.opentelemetry.io/collector/[email protected]/exporterhelper/queue_sender.go:95 +0x84
go.opentelemetry.io/collector/exporter/internal/queue.(*boundedMemoryQueue[...]).Consume(0x912a020, 0xc002d8c6f0)
	go.opentelemetry.io/collector/[email protected]/internal/queue/bounded_memory_queue.go:57 +0xc7
go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start.func1()
	go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:43 +0x79
created by go.opentelemetry.io/collector/exporter/internal/queue.(*Consumers[...]).Start in goroutine 1
	go.opentelemetry.io/collector/[email protected]/internal/queue/consumers.go:39 +0x7d

Repro / setup:

kubectl --context $CTX_CLUSTER3 create namespace observability

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update

# otel collector config via helm
cat > otel-config-datadog.yaml <<EOF
mode: deployment
config:
  exporters:
    datadog:
      api:
        site: datadoghq.eu
        key: <key>
  service:
    pipelines:
      logs:
        exporters:
          - datadog
      traces:
        exporters:
          - datadog
      metrics:
        exporters:
          - datadog
EOF

helm upgrade --install \
  --kube-context ${CTX_CLUSTER3} \
  -n observability \
  --set mode=deployment \
  -f otel-config-datadog.yaml \
  opentelemetry-collector open-telemetry/opentelemetry-collector

# enable Metrics
kumactl apply -f - <<EOF
type: MeshMetric
name: metrics-default
mesh: default
spec:
  targetRef:
    kind: Mesh
  # applications:
  #  - name: "backend"
  default:
    backends:
    - type: OpenTelemetry
      openTelemetry: 
        endpoint: "opentelemetry-collector.observability.svc:4317"
EOF

slonka avatar Feb 21 '24 09:02 slonka

Original author @bcollard

slonka avatar Feb 21 '24 09:02 slonka

We can add debug exporter example:

metrics:
  exporters:
    - datadog
    - debug

This will log all collected metrics, so we could find metrics on which datadog exporter fails, and create issue in OpenTelemetry collector. @bcollard could you look at it?

Automaat avatar Feb 26 '24 14:02 Automaat

otel-exporter-2.log otel-exporter-1.log

Otel-collector keeps crashing with the debug exporter for metrics.

bcollard avatar Feb 28 '24 09:02 bcollard

I see that I forgot about rest of the debug exporter config. @bcollard can you run this again with this config:

mode: deployment
config:
  exporters:
    debug:
      verbosity: detailed
    datadog:
      api:
        site: datadoghq.eu
        key: <key>
  service:
    pipelines:
      logs:
        exporters:
          - datadog
      traces:
        exporters:
          - datadog
      metrics:
        exporters:
          - datadog
          - debug

This should properly log collected metrics co we can debug further

Automaat avatar Mar 04 '24 13:03 Automaat

here attached otel-cluster1.log otel-cluster2.log

"kuma" appears a lot in the otel-cluster1.log file, not in the other.

bcollard avatar Mar 04 '24 16:03 bcollard

Logs look fine, but we could also verify if this is only datadog collector issue by pushing metrics to some other saas product like grafana and check if this issue is still there. There is an example on how to set this up in demo-scene repo. Cold out try this without datadog exporter @bcollard ?

Automaat avatar Mar 12 '24 14:03 Automaat

Removing closed state labels due to the issue being reopened.

github-actions[bot] avatar Mar 12 '24 14:03 github-actions[bot]

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] avatar Jul 03 '24 07:07 github-actions[bot]