opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Default behavior of consumer errors in the pipeline

Open bogdandrutu opened this issue 10 months ago • 6 comments

As of today in the collector main receiver OTLP receiver, errors are by default treated as retryable unless otherwise specified by the source by using the consumererror.NewPermanent. This requires components to accordingly mark every error permanent or not.

Based on the OTLP protocol though, only a very small subset of cases must be retryable, see https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures-1.

In contrib right now, looking at some of the most used processor attributes, resource, resourcedetector, and transform:

  1. They are all returning errors in some scenarios like failing some filtering conditions, etc.
  2. None of these components are returning permanent errors, even though I am highly confident that these errors are deterministic and a retry will not change the behavior.

Proposal: Since, I've seen this problem in practice, I would like to propose that we change the behavior of the consumer pipeline errors and treat all errors as permanent by default unless specified otherwise.

bogdandrutu avatar Feb 14 '25 20:02 bogdandrutu

I agree. I remember we discussed this and decided to make the retriable errors opt-in as part of the effort to revisit error propagation that needs to be revamped: https://github.com/open-telemetry/opentelemetry-collector/issues/7047, https://github.com/open-telemetry/opentelemetry-collector/pull/11085

dmitryax avatar Feb 14 '25 21:02 dmitryax

Slightly different but related to this issue: Currently, if two exporters in the same pipeline return an error for the same call, the fanoutconsumer will return a multierr.multiError, which will be considered ~retryable no matter the type of the component errors~ permanent if any component error is permanent. Is this the expected behavior?

Considering this, the new Error wrapper (current and hopefully final PR: #13042), and potentially a new "downstream" wrapper to implement the pipeline instrumentation RFC (this part specifically), I think we may want to lay down precisely how all these error wrappers are supposed to interact.

jade-guiton-dd avatar May 20 '25 13:05 jade-guiton-dd

Current uses of consumererror.NewPermanent: https://github.com/search?q=repo%3Aopen-telemetry%2Fopentelemetry-collector+OR+repo%3Aopen-telemetry%2Fopentelemetry-collector-contrib+consumererror.NewPermanent+language%3AGo&type=code&p=1

Proposed approach for doing this:

  1. Make it so that IsPermanent looks something like this, with defaultIsPermanent being false:
func IsPermanent(err error) bool {
	if err == nil {
		return false
	}

	if errors.As(err, &permanent{}) {
		return true
	}

      var consumerErr Error
	if errors.As(err, &consumerErr) {
		return consumerErr.IsRetryable()
	}

	return defaultIsPermanent
}
  1. Make sure all components/codepaths that use NewPermanent also use NewRetryableError for errors that are not permanent.
  2. Introduce a consumererror.DefaultToPermanentError feature gate that handles the value of defaultIsPermanent.
  3. Once consumererror.DefaultToPermanentError is in beta stage, deprecate NewPermanent and IsPermanent.
  4. Once consumererror.DefaultToPermanentError is in stable stage, remove NewPermanent and replace usages of IsPermanent with checking the IsRetryable method.

Thoughts about this approach?

mx-psi avatar Sep 22 '25 12:09 mx-psi

It looks like there is already a NewRetryableError function which uses the new consumererror.Error struct. Could we use that instead? The main task would be to update IsPermanent to take it into account.

jade-guiton-dd avatar Sep 22 '25 13:09 jade-guiton-dd

Ah! Yes, I hadn't realized that. Okay, updated my comment to use NewRetryableError and IsRetryable instead

mx-psi avatar Sep 22 '25 13:09 mx-psi

@mx-psi the approach above sounds good https://github.com/open-telemetry/opentelemetry-collector/issues/12393#issuecomment-3318609673

jmacd avatar Sep 22 '25 16:09 jmacd