dd-trace-rb icon indicating copy to clipboard operation
dd-trace-rb copied to clipboard

Custom resource names for HTTP requests

Open buddhistpirate opened this issue 7 years ago • 21 comments

Right now the resource names for the auto-instrumented Net::HTTP code are HTTP Methods: GET, POST etc. This is not a useful way for us to see our metrics broken out. As an example, there's no meaningful way for me to sort the Service view. However, if the Resource name was the domain, I could see what services are the most used, and track them over time much better.

If you don't want to change the default, could there either be an extension point (method to override), or a configuration option?

Thanks.

buddhistpirate avatar Dec 14 '17 20:12 buddhistpirate

I have opened up a PR at least as an example of an approach: https://github.com/DataDog/dd-trace-rb/pull/278

buddhistpirate avatar Dec 14 '17 22:12 buddhistpirate

I would love to see something like this make it into the gem. Knowing what service is responding slow and track historical performance would be very useful.

MMartyn avatar Feb 26 '18 15:02 MMartyn

This would be very useful to us as well. This is noted as blocked in Community, what is it blocked on?

doliveirakn avatar Jun 05 '19 19:06 doliveirakn

+1 for this.

HoneyryderChuck avatar Jun 17 '19 09:06 HoneyryderChuck

We are experimenting with this right now to see what's the best way to handle this. Overall, I think we want to provide an option to customize this resource name entirely (within some kind of callback), but we also want to evaluate if we can provide a better default (I think we can), and what that should be.

We talked about this a little bit in https://github.com/DataDog/dd-trace-rb/pull/387, and I think in one other issue (cannot remember now.)

delner avatar Jun 17 '19 15:06 delner

It looks like many ducks are in a row for being able to provide this feature, apart from supporting quantize as an option on the net/http integration. Previously this was mentioned being blocked on waiting for the quantization support to be standardized, but I'm not aware of any other blockers

  • [x] #384 Added HTTP quantization support
  • [x] Examples of http quantization being used
  • [x] Support for other integrations showing handling of quantization options (rack/rake)
  • [x] Standardization for outbound requests under span type "http" #751

Is there anything else blocking this at this point, other than needing prioritizing?

jeremyolliver avatar Sep 02 '19 04:09 jeremyolliver

So this came up while discussing instrumentation for Ethon, and I'll more or less repeat that here so everyone has a chance to see it:

By default, we only want to set the HTTP method as the resource, e.g. GET, but omit the path/query.

Sounds crazy, and I see the value of including the path. However, the resource is meant to be a GROUP BY like key, which in practice requires values that meaningfully group traces on some user-defined dimension. HTTP paths/query strings often contain unique input (such as numeric IDs, API tokens), which tend to flatten these traces into groups too small to be meaningful. Hence, as a default behavior, we've been only setting the method as the resource for HTTP client integrations.

In order to provide a better default, we'd require a much smarter quantization strategy that could detect and quantize just the variable/sensitive parts of URIs and not anything else. Needless to say, this would be sophisticated mechanism, and not something that we have in place today in the library.

If the omission of path/query as a default behavior is an issue for users (as it clearly and understandably has been raised as one), the alternative we could offer is the option to customize their span and set their own resource using a callback.

See HTTP as an example: https://github.com/DataDog/dd-trace-rb/blob/2515b5d44e45a4f8fb91882dccfdb0cf300bf19a/lib/ddtrace/contrib/http/instrumentation.rb#L70. This particular implementation gives users full control over their spans and resources for Net::HTTP. However a few words of caution:

  1. When using a custom block like this, users must be careful about not malforming spans, raising errors, flattening cardinality, or incurring performance issues; each of these present different risks to the stability and usability of Net::HTTP tracing.
  2. This remains undocumented because it's considered experimental; use at your own risk of breaking changes or becoming unavailable without warning in the future. We withheld this given we didn't want users adopting this Net::HTTP callback and risk breaking their applications until we could give the callback design a more robust treatment with full support across the library. If we do have a chance to prioritize such a callback design and give this a better treatment, we will communicate that change in support accordingly.

We'd rather users have the option to customize their traces for their needs in exchange for assumed responsibility, as opposed to not having the choice at all. If you feel inclined to try it, please do share your feedback so we can incorporate it into any future plans!

delner avatar Sep 03 '19 23:09 delner

We went ahead and tested an implementation using the after_request hook. It does appear to be sending the spans correctly ie no malformed spans. It also seems that in the APM services UI, under net/http, these spans are getting filtered due to not having resources that conform to GET, POST, etc. Looking within a different service, we can see these net/http spans nested within others and they do have their alternative resources, they just are not showing up in the UI for net/http.

I understand this is experimental and not necessarily an explicit feature at this point, so I am wondering how we can proceed to make this useable for our case.

MaxSechz avatar Oct 03 '19 18:10 MaxSechz

Can you explain in more detail about how they are gettting filtered? I've run some tests with this, and it seems to work fine from what I can see. What values are you changing in the after_request block?

The only case I can think of where they might be appear to be filtered is if you change the span's name to something else other than http.request: the service view will by default only show the most common span name, then the full list of resources associated with that span name.

If you can share some comparisons between spans that are filtered and those that are not, that would be helpful.

delner avatar Oct 04 '19 16:10 delner

The implementation is similar to #278 , just setting the resource to the host:port if the service is net/http. If I go to https://app.datadoghq.com/apm/service/net%2Fhttp/, none of the traces show up. On the other hand, if I look at a span representing a controller action, our test net/http spans show up nested inside. The TEST here is me setting the resource, so clearly its not malformed, or being being filtered completely. Screen Shot

MaxSechz avatar Oct 07 '19 14:10 MaxSechz

@MaxSechz I see, that's curious. Interesting that it shows up under the parent trace's service, but not the HTTP service. Clearly the span has been ingested and is available, what's unclear is why the HTTP service decides not to show it.

Since this seems to concern the backend, I'll have to follow up with the team.

To diagnose this better though, it might be a good idea to share your account so we can look at it. To do so you'd want to open a support ticket via the website (to protect any sensitive information about your account) and describe more or less the problem as you did here. Be sure to share any links you have for the traces in question so we can look at the same data you're seeing. You can mention me by name, and the team should expedite/relay it to me so I can investigate further.

delner avatar Oct 08 '19 12:10 delner

I'm kinda confused... is there a way to work around this? Currently this is what we see with our net/http rails app:

The section says Endpoints But all we see is HTTP verbs/methods (not the most useful information to start): Screenshot 2019-12-10 at 13 00 32

Then, for example, if you click on "GET", not even the host or path of the requests are displayed: Screenshot 2019-12-10 at 13 01 05

You have to click on EVERY single row on that table to see what is the actual http request that are being triggered there.

Is this the expected result from the implementation point of view or am I doing something wrong here?

igorescobar avatar Dec 10 '19 13:12 igorescobar

@igorescobar This is the current default behavior, but this comment might explain what's going on here and how to work around it: https://github.com/DataDog/dd-trace-rb/issues/277#issuecomment-527676453

delner avatar Dec 10 '19 18:12 delner

@igorescobar This is the current default behavior, but this comment might explain what's going on here and how to work around it: #277 (comment)

Hey, @delner! I appreciate your reply!

I guess we will wait for an official patch on this. It doesn't make sense for us to be paying thousands of dollars on a tool and in the end, you still have to work on ad-hoc implementations for basic things like this.

Our workaround for this is actually not using the net/http instrumentation at all. We are using the Trace dashboard, then we just add two new columns to the view which is the path and host.

igorescobar avatar Dec 11 '19 09:12 igorescobar

Hey @igorescobar, using facets, like you posted earlier, should also work with the built-in net/http integration: Screen Shot 2019-12-11 at 4 39 38 PM Both host and path are available in net/http traces if using @network.destination.ip and @http.url columns respectively.

marcotc avatar Dec 11 '19 21:12 marcotc

@igorescobar FWIW, I feel the pain too (I think we can do better) and we are working on plans to make improvements to the default behavior; when I have more information to share about that, I will.

delner avatar Dec 11 '19 23:12 delner

Hey @igorescobar, using facets, like you posted earlier, should also work with the built-in net/http integration: Screen Shot 2019-12-11 at 4 39 38 PM Both host and path are available in net/http traces if using @network.destination.ip and @http.url columns respectively.

@marcotc Thanks! Unfortunately, this option is only available from the Traces view. The default http.request gives me no customization options, just the "View in Traces" button.

Screenshot 2019-12-12 at 10 03 06

@delner Appreciated your comment! Cheers!

igorescobar avatar Dec 12 '19 10:12 igorescobar

@delner Is there any chance this hook can be added to the other HTTP clients instrumented by dd-trace-rb? We built our internal HTTP client on top of Excon and so can't use the hook in net/http.

MXfive avatar May 25 '20 05:05 MXfive

@MXfive It's something we'll consider expanding to other spans as well, including Excon, but isn't on the top of our docket right now. I would like to see more "span hooks" like this in the future, as a general feature to allow users to customize instrumentation.

delner avatar May 26 '20 16:05 delner

@delner I have a question regarding timeouts and the after_request block (let me know if this is better suited as its own issue). We use something like this:

Datadog::Contrib::HTTP::Instrumentation.after_request do |span, _http_instance, req, _res|
  host = req.uri.host
  case host
  when /INTERNAL/ # Internal
    Datadog::Analytics.set_measured(span) # Keep all metrics, to better understand our intranet traffic
    span.service = ENV['DD_SERVICE']
  when /amazonaws/, /169.254.169.254/ # AWS
    span.service = 'http-amazonaws.com'
  else
    org_domain = host.split('.').last(2).join('.')
    span.service = "http-#{org_domain}"
  end
  path = req.path.to_s.split('?').first
  path = path.split('/').keep_if { |chunk| chunk[/^[a-z_]+$/i] }.join('/')
  span.resource = "#{host}/#{path}##{req.method}"
end

The problem here is that the block doesn't get called on timeouts - in which case we get the default service of net/http and a resource of the method. Can we talk about either:

  • ensuring the after_request block runs for timeouts?
  • adding a before_request in conjunction with an ||= on span.service and span.resource?
  • other suggestions??

dudo avatar Jul 19 '21 16:07 dudo

@dudo I think it probably makes sense that after_request should be invoked when some kind of exception is raised (e.g. timeout), although if that happens, its possible some of those arguments will be nil (if request or response was never generated.) Users will have to nil check those then.

Not sure we want to add a before_request yet; we may do it in the future, but there are some model/usage considerations to address first.

Also, just as a suggestion, we discourage multiplexing service names (having unique service names for different spans) as it has a potential to greatly clutter your services page (especially if your code interacts with subdomains with randomness in them.) If it isn't already possible, we should allow for you to group/filter spans in your service by domain (amazon.com, etc.) in the UI.

delner avatar Aug 04 '21 15:08 delner

Can we follow the quantize pattern established on other instrumentations, and allow it to be configurable? I'd love to be able to customize my faraday resource names per-domain, and keep them all under the same service_name.

BobbyMcWho avatar Mar 01 '23 20:03 BobbyMcWho

Closing this as it seems the direction has been to expose domains as services (via split_by_domain). This is not our preference as setting the resource as domain is much better for us. The distinction here is that we handle webhooks for integrations so the list of domains is quite large, which would pollute our service namespace, but as resources it makes perfect sense (like sql queries).

We are using the after_request extension point for the http integration to get this behavior and wish it existed for the other http gems (httpclient, restclient, ethon).

buddhistpirate avatar May 18 '23 15:05 buddhistpirate