opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Exporting failed. No more retries left. Dropping data with OTLP/Elastic Exporter

Open harshul-yadav-gl opened this issue 1 year ago • 1 comments

I’m experiencing intermittent issues with my OpenTelemetry setup, specifically related to exporting metrics to Elasticsearch using the OTLP exporter. The setup involves multiple services that send telemetry data to different OTLP agent collectors, which then export the data to a central collector. The central collector exports all telemetry data to Elasticsearch.

Configuration Here's the relevant part of my otel-collector-config.yaml file:

receivers:
  otlp:
    protocols:
      grpc:
      http:
        cors:
          allowed_origins:
            - "http://*"
            - "https://*"

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 6000
  batch:

exporters:
  logging:
    loglevel: debug
  otlp/elastic:
    endpoint: elk-endpoint
    headers:
      Authorization: "Bearer Token"
    tls:
      insecure_skip_verify: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]

    logs:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]

    metrics:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]

Error Details I occasionally encounter the following errors in the collector logs:

error    exporterhelper/queue_sender.go:125    Exporting failed. No more retries left. Dropping data.    {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 822}

info    exporterhelper/retry_sender.go:129    Exporting failed. Will retry the request after interval.    {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.030954787s"}

Additional error details as observed in Opentelemetry collector pod logs

2024-08-27T10:56:57.833Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "6.922757504s"}
2024-08-27T10:56:58.323Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "2.054454828s"}
2024-08-27T10:56:58.324Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "5.25555217s"}
2024-08-27T10:56:58.421Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "13.664290849s"}
2024-08-27T10:56:58.810Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "3.214333298s"}
2024-08-27T10:56:59.384Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "6.239830222s"}
2024-08-27T10:56:59.438Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "3.242129234s"}
2024-08-27T10:57:03.063Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 326}
2024-08-27T10:57:05.985Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 468}
2024-08-27T10:57:06.247Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 259}
2024-08-27T10:57:06.395Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.023227611s"}
     -> error: Str(GroundingHubException)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(IOException)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(IOException)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> error: Str(none)
     -> level: Str(error)
     -> error: Str(GroundingHubException)
     -> error: Str(none)
     -> Name: system.network.errors
     -> Description: System network errors
     -> Unit: errors
2024-08-27T10:57:06.436Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "5.481593831s"}
2024-08-27T10:57:07.024Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "5.745735946s"}
2024-08-27T10:57:07.146Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "11.680599097s"}
2024-08-27T10:57:07.181Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "11.066047572s"}
2024-08-27T10:57:07.433Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.155981797s"}
2024-08-27T10:57:07.787Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.912185827s"}
2024-08-27T10:57:08.026Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.395621059s"}
2024-08-27T10:57:08.513Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 512}
2024-08-27T10:57:08.571Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.446042735s"}
2024-08-27T10:57:09.655Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "2.825948872s"}
2024-08-27T10:57:16.397Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "860.048665ms"}
     -> Name: system.network.errors
     -> Description: System network errors
     -> Unit: errors
     -> Name: system.network.errors
     -> Description: System network errors
     -> Unit: errors
2024-08-27T10:57:17.460Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "661.875106ms"}
2024-08-27T10:57:17.505Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.797723167s"}
2024-08-27T10:57:18.588Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "9.303621398s"}
2024-08-27T10:57:18.702Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "12.145983633s"}
2024-08-27T10:57:18.725Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.323995075s"}
2024-08-27T10:57:18.951Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "14.53231812s"}
2024-08-27T10:57:19.070Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "3.066409691s"}
2024-08-27T10:57:19.596Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "7.647088599s"}
2024-08-27T10:57:20.121Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "14.133285413s"}
2024-08-27T10:57:21.318Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 533}
2024-08-27T10:57:25.625Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 736}
2024-08-27T10:57:26.134Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 205}
2024-08-27T10:57:26.401Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.267960273s"}
     -> Name: airflow.dag_processing.import_errors
     -> Name: system.network.errors
     -> Description: System network errors
     -> Unit: errors
2024-08-27T10:57:27.251Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "11.777193587s"}
2024-08-27T10:57:27.420Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "862.682838ms"}
2024-08-27T10:57:27.518Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "842.137459ms"}
2024-08-27T10:57:27.645Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "10.701868231s"}
2024-08-27T10:57:27.864Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "8.274129426s"}
2024-08-27T10:57:28.300Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "7.282718258s"}
2024-08-27T10:57:28.509Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "12.737171673s"}
2024-08-27T10:57:28.590Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "2.086697542s"}
2024-08-27T10:57:29.422Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "2.206177122s"}
2024-08-27T10:57:29.484Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 514}
2024-08-27T10:57:31.920Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 311}
2024-08-27T10:57:32.087Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 305}
2024-08-27T10:57:33.718Z	error	exporterhelper/queue_sender.go:125	Exporting failed. No more retries left. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 506}
2024-08-27T10:57:36.414Z	info	exporterhelper/retry_sender.go:129	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "943.222607ms"}

Steps to Reproduce

  1. Deploy services with OTLP agent collectors that export to a central collector.
  2. Configure the central collector to export to an Elasticsearch endpoint.
  3. Observe intermittent export failures with the errors mentioned above.

Expected Behavior Metrics and traces should be consistently exported to Elasticsearch without retries or dropped data due to timeouts.

Actual Behavior The OTLP exporter intermittently fails to export data, citing "max elapsed time expired" and "context deadline exceeded," leading to dropped data.

Additional Context This issue seems to point towards potential timeout issues between the OTLP collector and Elasticsearch. I’d appreciate any guidance on resolving this or mitigating the impact on data exports.

harshul-yadav-gl avatar Aug 27 '24 10:08 harshul-yadav-gl

cc @andrzej-stencel, could this be a misconfiguration on Elastic's side?

mx-psi avatar Aug 27 '24 12:08 mx-psi

Hi @mx-psi @andrzej-stencel

We have a updated collector configurations

`  collector.yaml: |
     receivers:
       otlp:
         protocols:
           grpc:
             max_concurrent_streams: 500
           http:
             cors:
               allowed_origins:
                 - "http://*"
                 - "https://*"
     processors:
       memory_limiter:
         check_interval: 1s
         limit_mib: 6000
       batch:
         timeout: 5s # Adjust based on your latency requirements
         send_batch_size: 2000 # Increase batch size for efficiency
     exporters:
       logging:
         loglevel: debug
       otlp/elastic:
         endpoint: "APM SERVER ENDPOINT"	  
         headers:
           Authorization: "Bearer TOKEN"
         timeout: 20s
         tls:
           insecure_skip_verify: true
         sending_queue:
           enabled: true
           num_consumers: 10 # Increase to process more items in parallel
           queue_size: 10000 # Increase queue size to handle bursts
         retry_on_failure:
           enabled: true
           initial_interval: 1s
           max_interval: 10s
           max_elapsed_time: 300s
     service:
       pipelines:
         traces:
           receivers: [otlp]
           processors: [batch]
           exporters: [logging, otlp/elastic]
         metrics:
           receivers: [otlp]
           processors: [batch]
           exporters: [logging, otlp/elastic]
         logs:
           receivers: [otlp]
           processors: [batch]
           exporters: [logging, otlp/elastic]
       telemetry:
         logs:
           level: "info" # Adjust logging level to reduce overhead if necessary
           `
     And we configured the correct end point of elk with the collectors these errors are suddenly

kateseshan avatar Aug 28 '24 09:08 kateseshan

Sorry for the delay in response @harshul-yadav-gl. Does this problem still occur?

What do you mean by this: "And we configured the correct end point of elk with the collectors these errors are suddenly"? This sounds to me like an unfinished sentence. Can you describe what happened after you adjusted your configuration?

andrzej-stencel avatar Oct 06 '24 18:10 andrzej-stencel

Since v0.92.0 (January 2024), after retry_on_failure::max_elapsed_time is reached, the data is dropped (previously it would be returned to the queue). To prevent data loss in this scenario, set retry_on_failure::max_elapsed_time to 0 to retry indefinitely.

andrzej-stencel avatar Mar 20 '25 08:03 andrzej-stencel