opentelemetry-collector
opentelemetry-collector copied to clipboard
Exporting failed. No more retries left. Dropping data with OTLP/Elastic Exporter
I’m experiencing intermittent issues with my OpenTelemetry setup, specifically related to exporting metrics to Elasticsearch using the OTLP exporter. The setup involves multiple services that send telemetry data to different OTLP agent collectors, which then export the data to a central collector. The central collector exports all telemetry data to Elasticsearch.
Configuration Here's the relevant part of my otel-collector-config.yaml file:
receivers:
otlp:
protocols:
grpc:
http:
cors:
allowed_origins:
- "http://*"
- "https://*"
processors:
memory_limiter:
check_interval: 1s
limit_mib: 6000
batch:
exporters:
logging:
loglevel: debug
otlp/elastic:
endpoint: elk-endpoint
headers:
Authorization: "Bearer Token"
tls:
insecure_skip_verify: true
service:
pipelines:
traces:
receivers: [otlp]
exporters: [logging, otlp/elastic]
logs:
receivers: [otlp]
exporters: [logging, otlp/elastic]
metrics:
receivers: [otlp]
exporters: [logging, otlp/elastic]
Error Details I occasionally encounter the following errors in the collector logs:
error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 822}
info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.030954787s"}
Additional error details as observed in Opentelemetry collector pod logs
2024-08-27T10:56:57.833Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "6.922757504s"}
2024-08-27T10:56:58.323Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "2.054454828s"}
2024-08-27T10:56:58.324Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "5.25555217s"}
2024-08-27T10:56:58.421Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "13.664290849s"}
2024-08-27T10:56:58.810Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "3.214333298s"}
2024-08-27T10:56:59.384Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "6.239830222s"}
2024-08-27T10:56:59.438Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "3.242129234s"}
2024-08-27T10:57:03.063Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 326}
2024-08-27T10:57:05.985Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 468}
2024-08-27T10:57:06.247Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 259}
2024-08-27T10:57:06.395Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.023227611s"}
-> error: Str(GroundingHubException)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(IOException)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(IOException)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> error: Str(none)
-> level: Str(error)
-> error: Str(GroundingHubException)
-> error: Str(none)
-> Name: system.network.errors
-> Description: System network errors
-> Unit: errors
2024-08-27T10:57:06.436Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "5.481593831s"}
2024-08-27T10:57:07.024Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "5.745735946s"}
2024-08-27T10:57:07.146Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "11.680599097s"}
2024-08-27T10:57:07.181Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "11.066047572s"}
2024-08-27T10:57:07.433Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.155981797s"}
2024-08-27T10:57:07.787Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.912185827s"}
2024-08-27T10:57:08.026Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.395621059s"}
2024-08-27T10:57:08.513Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 512}
2024-08-27T10:57:08.571Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.446042735s"}
2024-08-27T10:57:09.655Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "2.825948872s"}
2024-08-27T10:57:16.397Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "860.048665ms"}
-> Name: system.network.errors
-> Description: System network errors
-> Unit: errors
-> Name: system.network.errors
-> Description: System network errors
-> Unit: errors
2024-08-27T10:57:17.460Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "661.875106ms"}
2024-08-27T10:57:17.505Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.797723167s"}
2024-08-27T10:57:18.588Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "9.303621398s"}
2024-08-27T10:57:18.702Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "12.145983633s"}
2024-08-27T10:57:18.725Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.323995075s"}
2024-08-27T10:57:18.951Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "14.53231812s"}
2024-08-27T10:57:19.070Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "3.066409691s"}
2024-08-27T10:57:19.596Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "7.647088599s"}
2024-08-27T10:57:20.121Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "14.133285413s"}
2024-08-27T10:57:21.318Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 533}
2024-08-27T10:57:25.625Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 736}
2024-08-27T10:57:26.134Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 205}
2024-08-27T10:57:26.401Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "1.267960273s"}
-> Name: airflow.dag_processing.import_errors
-> Name: system.network.errors
-> Description: System network errors
-> Unit: errors
2024-08-27T10:57:27.251Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "11.777193587s"}
2024-08-27T10:57:27.420Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "862.682838ms"}
2024-08-27T10:57:27.518Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "842.137459ms"}
2024-08-27T10:57:27.645Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "10.701868231s"}
2024-08-27T10:57:27.864Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "8.274129426s"}
2024-08-27T10:57:28.300Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "7.282718258s"}
2024-08-27T10:57:28.509Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "12.737171673s"}
2024-08-27T10:57:28.590Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "2.086697542s"}
2024-08-27T10:57:29.422Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "2.206177122s"}
2024-08-27T10:57:29.484Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 514}
2024-08-27T10:57:31.920Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 311}
2024-08-27T10:57:32.087Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "traces", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 305}
2024-08-27T10:57:33.718Z error exporterhelper/queue_sender.go:125 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded", "dropped_items": 506}
2024-08-27T10:57:36.414Z info exporterhelper/retry_sender.go:129 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "metrics", "name": "otlp/elastic", "error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "943.222607ms"}
Steps to Reproduce
- Deploy services with OTLP agent collectors that export to a central collector.
- Configure the central collector to export to an Elasticsearch endpoint.
- Observe intermittent export failures with the errors mentioned above.
Expected Behavior Metrics and traces should be consistently exported to Elasticsearch without retries or dropped data due to timeouts.
Actual Behavior The OTLP exporter intermittently fails to export data, citing "max elapsed time expired" and "context deadline exceeded," leading to dropped data.
Additional Context This issue seems to point towards potential timeout issues between the OTLP collector and Elasticsearch. I’d appreciate any guidance on resolving this or mitigating the impact on data exports.
cc @andrzej-stencel, could this be a misconfiguration on Elastic's side?
Hi @mx-psi @andrzej-stencel
We have a updated collector configurations
` collector.yaml: |
receivers:
otlp:
protocols:
grpc:
max_concurrent_streams: 500
http:
cors:
allowed_origins:
- "http://*"
- "https://*"
processors:
memory_limiter:
check_interval: 1s
limit_mib: 6000
batch:
timeout: 5s # Adjust based on your latency requirements
send_batch_size: 2000 # Increase batch size for efficiency
exporters:
logging:
loglevel: debug
otlp/elastic:
endpoint: "APM SERVER ENDPOINT"
headers:
Authorization: "Bearer TOKEN"
timeout: 20s
tls:
insecure_skip_verify: true
sending_queue:
enabled: true
num_consumers: 10 # Increase to process more items in parallel
queue_size: 10000 # Increase queue size to handle bursts
retry_on_failure:
enabled: true
initial_interval: 1s
max_interval: 10s
max_elapsed_time: 300s
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [logging, otlp/elastic]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [logging, otlp/elastic]
logs:
receivers: [otlp]
processors: [batch]
exporters: [logging, otlp/elastic]
telemetry:
logs:
level: "info" # Adjust logging level to reduce overhead if necessary
`
And we configured the correct end point of elk with the collectors these errors are suddenly
Sorry for the delay in response @harshul-yadav-gl. Does this problem still occur?
What do you mean by this: "And we configured the correct end point of elk with the collectors these errors are suddenly"? This sounds to me like an unfinished sentence. Can you describe what happened after you adjusted your configuration?
Since v0.92.0 (January 2024), after retry_on_failure::max_elapsed_time is reached, the data is dropped (previously it would be returned to the queue). To prevent data loss in this scenario, set retry_on_failure::max_elapsed_time to 0 to retry indefinitely.