opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

[reciever/otlp] Return proper http response code based on retryable errors

Open TylerHelmuth opened this issue 1 year ago • 1 comments
trafficstars

Description: Updates the receiver's http response to return a proper http status based on whether or not the pipeline returned a retryable error. Builds upon the work done in https://github.com/open-telemetry/opentelemetry-collector/pull/8080 and https://github.com/open-telemetry/opentelemetry-collector/pull/9307

Link to tracking Issue:

Closes https://github.com/open-telemetry/opentelemetry-collector/issues/9337 Closes https://github.com/open-telemetry/opentelemetry-collector/issues/8132

Testing:

Updated lots of unit tests

TylerHelmuth avatar Jan 23 '24 20:01 TylerHelmuth

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.13%. Comparing base (3da7e16) to head (d4d96ad).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9357   +/-   ##
=======================================
  Coverage   91.13%   91.13%           
=======================================
  Files         353      353           
  Lines       18728    18740   +12     
=======================================
+ Hits        17067    17079   +12     
  Misses       1333     1333           
  Partials      328      328           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 23 '24 20:01 codecov[bot]

@open-telemetry/collector-maintainers please review

TylerHelmuth avatar Feb 16 '24 17:02 TylerHelmuth

@open-telemetry/collector-maintainers please review

TylerHelmuth avatar Feb 21 '24 18:02 TylerHelmuth

@bogdandrutu PTAL

mx-psi avatar Feb 29 '24 15:02 mx-psi

Hi, I was just looking at this issue myself locally, and have found this open issue...

I notice this, #9307 and #8080 are somewhat coupled to GRCP status codes.

I also notice we have consumer/consumererror, which seems to address somewhat similar concerns, i.e. retryable and permanent errors.

Can you clarify if consumer/consumererror would be appropriate here, or is it intended for something else? I see consumererror.New is found in a number of places in the contrib codebase.

Have you considered using consumererror.IsPermanent(err) in the receiver to determine the response status code?

Using consumererror would seem to make it a bit more user friendly to actually create the errors in the downstream processor, e.g., something like return td, consumererror.NewTraces(errDataRefused, td) in memorylimiter.

0x006EA1E5 avatar Feb 29 '24 17:02 0x006EA1E5

@0x006EA1E5 as you've deduced, consumer/consumererror is normally used downstream from receivers to tell receivers what kind of errors they got from r.nextConsumer.Consume*. In this instance we're in a receiver and the error we're propagating back to the caller is subject to the OTLP specification (which is closely tied to grpc objects even for http) so returning consumererror would not be appropriate. The receiver does use consumererror when initially handling the error returned from downstream components.

TylerHelmuth avatar Mar 12 '24 20:03 TylerHelmuth

@bogdandrutu @open-telemetry/collector-approvers please take a look. errors.GetHTTPStatusCodeFromStatus(err) does resemble some other internal functions for handling errors in other code paths, but I think a separate refactor after this is added will be best to keep this PR small and targeted.

TylerHelmuth avatar Mar 12 '24 20:03 TylerHelmuth