opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Loss of traces while terminating a otel collector

Open Nafisha-14 opened this issue 2 months ago • 3 comments
trafficstars

Component(s)

receiver/otlp

What happened?

Describe the bug While performing a collector termination, we observed a loss of traces

Otlp receiver received 7604 spans. Exporter sent only 7504 spans. ~100-200 spans were lost during otel-collector termination.

Steps to reproduce

  1. Deploy otel collector (Receiver: otlp/grpc , exporter: opensearch)
  2. Send spans via OTLP gRPC.
  3. Graceful termination of open telemetry running process
  4. Observe otelcol_receiver_accepted_spans, otelcol_receiver_refused_spans, otelcol_exporter_sent_spans, otelcol_exporter_send_failed_spans

What did you expect to see? No loss of spans during a termination. All spans received by the collector should be received and exported

What did you see instead? There is a mismatch between spans received and exported during otel collector termination Some spans are lost and not ingested into the backend (Opensearch).

Collector version

0.128.0

Environment information

OS: SLES 15-SP6 Compiler: go 1.24.6

OpenTelemetry Collector configuration

exporters:
  opensearch:
    http:
      endpoint: http://opensearch:9200
      tls:
        insecure: true
    retry_on_failure:
      enabled: true
      max_elapsed_time: 0
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 1000
      sizer: requests
      storage: file_storage/opensearch
      block_on_overflow: false

extensions:
  file_storage/opensearch:
    directory: /opt/collector/queue/opensearch
    create_directory: true

  health_check:
    endpoint: 0.0.0.0:13133

processors:
  batch: {}
  batch/opensearch:
    send_batch_size: 8192
    send_batch_max_size: 8192
    timeout: 200ms
  batch/otlp:
    send_batch_size: 8192
    send_batch_max_size: 8192
    timeout: 200ms
  memory_limiter:
    check_interval: 5s
    limit_percentage: 85
    spike_limit_percentage: 10

connectors:
  forward/traces:
  forward/logs:

receivers:
  otlp/grpc-insecure:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4319
service:

  extensions:
    - health_check
    - file_storage/opensearch

  pipelines:
    traces:
      exporters:
        - forward/traces
      processors:
        - memory_limiter
        - batch
      receivers:
        - otlp/grpc-insecure

    traces/opensearch:
      exporters:
        - opensearch
      processors:
        - memory_limiter
        - batch/opensearch
      receivers:
        - forward/traces

Log output


Sent spans via simulator : ~12000

Receiver accepted: 7604

otelcol_receiver_accepted_spans{receiver=\"otlp/grpc-insecure\",transport=\"grpc\"} 7604


Receiver refused: 0

otelcol_receiver_refused_spans{receiver=\"otlp/grpc-insecure\"} 0


Exporter sent: 7504

otelcol_exporter_sent_spans{exporter=\"opensearch\"} 7504


Exporter failed: 0

otelcol_exporter_send_failed_spans{exporter=\"opensearch\"} 0

After termination of opentelemetry collector,

New instance of otel collector exported span count:

otelcol_exporter_sent_spans{exporter=\"opensearch\"} 4298

Total expected spans stored in opensearch - 12002 

actual stored in backend - 11802 (old instance)7504 + 4298 (new instance)

esRest GET /jaeger-span-2025-09-12/_count
{"count":**11802**,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0}}

Additional context

The number of spans received by the exporter does not match the spans sent from the receiver, and this occurs for both OTLP gRPC and OTLP HTTP receivers.

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Nafisha-14 avatar Sep 17 '25 07:09 Nafisha-14