NServiceBus OpenTelemetry: In tail-sampling scenarios, failures that are fixed in subsequent retries may be harmful

OpenTelemetry: In tail-sampling scenarios, failures that are fixed in subsequent retries may be harmful

Open lailabougria opened this issue 2 years ago • 2 comments

Describe the feature.

Is your feature related to a problem? Please describe.

Every failure that occurs causes the span to be tagged as failed. Suppose the customer is using a tail-sampling strategy, keeping all failed traces. In that case, they may want to filter out traces that include failures that were solved by subsequent retries and rather only sample those who consistently failed, to the point that they were moved to the error queue.

Describe the requested feature

Make the behavior on tracking failures configurable, eg. only mark messages that are moved to the error queue as failed, and rather add specific tags to failures (eg. exception message, nr of retries, etc) when there are retries left based on the recoverability policy. This way, users may still identify traces that included retries, but only failed messages (that went to the error queue) will show up as failed traces.

Describe alternatives you've considered

Additional Context

No response

May 02 '23 13:05 lailabougria

Would using span links rather than child spans solve this?

Apr 22 '24 10:04 andreasohlund

It would, for any delayed retries we should use span links instead of child spans. Users can control the time they wait for a single trace to complete (for tail-sampling), and that value usually fluctuates around 5 seconds.

Apr 22 '24 12:04 lailabougria

Discussed with OpenTelemetry community and validated that this is a false assumption. Not needed for now.

Jun 17 '24 08:06 SzymonPobiega

NServiceBus NServiceBus copied to clipboard

OpenTelemetry: In tail-sampling scenarios, failures that are fixed in subsequent retries may be harmful

Describe the feature.

Is your feature related to a problem? Please describe.

Describe the requested feature

Describe alternatives you've considered

Additional Context

NServiceBus
NServiceBus copied to clipboard