dd-trace-rb Supressing errors when handling retries

Problem:

We are using RestClient to make make a request to an external API which is raising RestClient::ServerBrokeConnection which we plan on addressing using ruby's retry since the majority of requests are succeeding; however, I do not see

Given that datadog's RestClient patch is going to associate the error with the span and finish the span before my app code can retry, I'm curious how I can suppress these errors when I am retrying the error.

Describe the goal of the feature

Guidance and/or functionality for handling error retrying within an application that crosses the datadog patching boundary, ideally for other integrations.

Additionally, it'd be nice to have some insight into these transient errors, or retries.

Describe alternatives you've considered Switching from RestClient to faraday and using the retry middleware to hopefully be handled within datadog's error tracking of the span.

Aug 08 '24 14:08 liaden

Created this as a new issue instead of commenting on https://github.com/DataDog/dd-trace-rb/issues/3820 since the perspectives are different (patching redis that does retries vs app handling retries outside of datadog and the library).

Aug 08 '24 14:08 liaden

Hey @liaden! 👋

We are using RestClient to make make a request to an external API which is raising RestClient::ServerBrokeConnection which we plan on addressing using ruby's retry since the majority of requests are succeeding; however, I do not see

Because you creating a custom wrapper to perform the retrying, I recommend creating a span that represents your wrapper. This way you'll have a custom span representing the specific operation that you created.

Given that datadog's RestClient patch is going to associate the error with the span and finish the span before my app code can retry, I'm curious how I can suppress these errors when I am retrying the error.

If you create a span to represent your wrapper, the RestClient requests will be encapsulated in a single parent span, so you won't have issues with Datadog spans finishing before your code runs. The errors will also not be propagated, given your wrapper will capture them, so there's no concern with error reporting.

Please let me know if I'm misunderstanding your set up.

Aug 08 '24 22:08 marcotc

@marcotc I don't think I understand fully what you are proposing, and that may be because I am missing/misunderstanding some of the capabilities of Datadog?

If I wrap the RestClient::Request.execute that my code is doing with my own span, that span will be associated with my sidekiq.job operation and not the rest_client.request operation that is attached to the api.external-vendor.com service where I was looking at the error.

Is there something special about manually creating a span that wraps only one other span?

Aug 15 '24 19:08 liaden

🤔 Maybe I misunderstand your current scenario.

which we plan on addressing using ruby's retry since the majority of requests are succeeding

How would your retry code look like, more specifically how you would it interact with the RestClient calls?

My suggestion from the earlier comment is based on the fact that errors in the Datadog Error Tracking product only count if they bubble up all the way to the top span of a trace. Error in internal spans, that get rescued, do not count for error tracking. The spans will be marked as error, because that's an accurate representation of the Ruby process control flow, but it will not trigger Datadog error tracking.

Aug 21 '24 21:08 marcotc