opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

[retry_sender] Message on retryable error is causing concern

Open atoulme opened this issue 1 month ago • 3 comments
trafficstars

Component(s)

No response

What happened?

Describe the bug When there is an error related to a failed export tied to a timeout, users see an info log such as:

retry_sender.go:126 Exporting failed. Will retry the request after interval. {"error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "9.640533959s"}

This error is transient and the sending will be retried. However, it creates concern from users as they have a hard time interpreting what it means.

The error can be caused by a variety of factors:

  • The connection timed out
  • The backend we talk to is busy and timed out
  • There is a network bandwidth constraint
  • There is an intermediate actor such as a proxy, firewall, or load balancer that is somehow dropping the connection
  • We are sending a large payload

This issue is meant to discuss how to shore up more information about the source of the error and reduce user anxiety when they see "Exporting failed". We need to find a way to show how this error is benign but can constitute a pattern for dropped data eventually if the connection is bad.

Collector version

v0.137.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration


Log output


Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

atoulme avatar Oct 07 '25 15:10 atoulme