opentelemetry-collector
opentelemetry-collector copied to clipboard
[retry_sender] Message on retryable error is causing concern
Component(s)
No response
What happened?
Describe the bug When there is an error related to a failed export tied to a timeout, users see an info log such as:
retry_sender.go:126 Exporting failed. Will retry the request after interval. {"error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "9.640533959s"}
This error is transient and the sending will be retried. However, it creates concern from users as they have a hard time interpreting what it means.
The error can be caused by a variety of factors:
- The connection timed out
- The backend we talk to is busy and timed out
- There is a network bandwidth constraint
- There is an intermediate actor such as a proxy, firewall, or load balancer that is somehow dropping the connection
- We are sending a large payload
This issue is meant to discuss how to shore up more information about the source of the error and reduce user anxiety when they see "Exporting failed". We need to find a way to show how this error is benign but can constitute a pattern for dropped data eventually if the connection is bad.
Collector version
v0.137.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")
OpenTelemetry Collector configuration
Log output
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.