sentry-dotnet icon indicating copy to clipboard operation
sentry-dotnet copied to clipboard

SentryHttpMessageHandler issue grouping is too eager

Open ericsampson opened this issue 1 year ago • 20 comments

Problem Statement

When using the SentryHttpMessageHandler, errors from all called URLs get lumped into one Issue. That's not great as the grouping is too eager; it's not very helpful to bundle HTTP errors from 25 different microservices and 15 external SaaS providers into a single Issue.

Solution Brainstorm

I'd love if the grouping could use an auto-parameterized version of the called URL as a grouping key.

ericsampson avatar Aug 30 '24 15:08 ericsampson

I'd love if the grouping could use an auto-parameterized version of the called URL as a grouping key.

Hey @ericsampson ,

Could you give an example of some URLs that are being grouped together and how these would ideally be grouped instead?

jamescrosswell avatar Sep 02 '24 01:09 jamescrosswell

We're always looking at ways to improve the out-of-the-box experience. So some examples would be hugely appreciated.

As one way to unblock yourself, you could look into setting your own custom fingerprint for events.

bitsandfoxes avatar Sep 04 '24 13:09 bitsandfoxes

@bitsandfoxes re the custom fingerprint option;

  • how would I correctly use this with the SentryHttpMessageHandler.cs, i.e. what SentryEvent property should I test against to know in a non-fragile manner that the event source was the SentryHttpMessageHandler ? It sets HttpClientOrigin, but that's an internal prop currently.
  • In the custom fingerprint code, would it be correct for me to use the event.Request.Url property?

ericsampson avatar Sep 04 '24 16:09 ericsampson

@jamescrosswell unless I'm misreading something in the dashboard, it currently appears that all called URLs are grouped into one single Issue. Because the default grouping that Sentry is choosing for these events is just the stack trace, which is identical regardless of the called URLs.

In my ideal world, each of these lines would be captured as a separate Issue:

GET foo.com/blah
GET bar.org/blah
POST bar.org/blah
// The handler should autodetect that the following called URL pattern is `foo.com/org/{id}/division/{id}`
// via tokenizing the called URL for patterns of `/{number}/?' and `/{UUID}/?` and '/{bool}/?`
// This isn't 100% perfect because it won't catch textual URL path params, but should cover 95% of cases seen in the real world.
GET foo.com/org/1/division/1, foo.com/org/2/division/1, foo.com/org/2/division/2 

ericsampson avatar Sep 04 '24 17:09 ericsampson

  • how would I correctly use this with the SentryHttpMessageHandler.cs, i.e. what SentryEvent property should I test against to know in a non-fragile manner that the event source was the SentryHttpMessageHandler ? It sets HttpClientOrigin, but that's an internal prop currently.

@ericsampson the HttpFailedRequestHandler sets an exception mechanism, so you can do something like this:

    options.SetBeforeSend((evt, hint) =>
    {
        if (evt.SentryExceptions?.Any(x => x.Mechanism is { Type: "SentryHttpFailedRequestHandler" }) is true)
        {
            if (evt.Request.Url != null)
            {
                evt.Fingerprint = [ evt.Request.Url ];
            }
        }
    });

You can get a bunch of other information via the Hint as well (see docs).

jamescrosswell avatar Sep 05 '24 00:09 jamescrosswell

@jamescrosswell unless I'm misreading something in the dashboard, it currently appears that all called URLs are grouped into one single Issue. Because the default grouping that Sentry is choosing for these events is just the stack trace, which is identical regardless of the called URLs.

I see... I guess depending on the HttpStatusCode it may or may not make sense to group by Uri. So it might be preferable to group all the 429 Too Many Requests responses together (same for 503 Service Unavailable) but something like 403 Forbidden you probably want to group by URI (since the specific location is relevant to the error).

Various other things like the request content and headers could also be relevant.

I wonder if there is some "default behaviour" Sentry could provide, out of the box, that would provide useful fingerprinting in at least the majority of situations (without being able to know anything specific about the various use cases)... 🤔

jamescrosswell avatar Sep 05 '24 00:09 jamescrosswell

@jamescrosswell I don't think there should be any status-code-dependent grouping for the default behavior. It should group by method + URL

There's no way to universally classify any HTTP code as "entire server" vs "this specific HTTP call" IMO. For example it's very possible for 429 to be per-endpoint rate limits, and e.g 503 only means that the server is not unavailable to respond to that specific request--it could totally be possible for it to respond successfully to a different URL depending on what's going on behind the scenes. And the same argument can be made for any other 500 or 400 code that I can think of; HTTP responses are resource-targeted, and so you can't generically classify any HTTP codes as "per resource" vs "entire server" Anything that involves more detail about specific codes would be only relevant to a given implementation, and so the SDK consumer would have to override the default behavior with their system knowledge, if desired

ericsampson avatar Sep 13 '24 19:09 ericsampson

I don't think there should be any status-code-dependent grouping for the default behavior. It should group by method + URL

Yeah, reading through this again I think I agree with you... how HTTP Status codes are used is going to be specific to the server application.

I'm wondering how we could auto-parameterize URL though. For incoming requests, we can use Route data such as Endpoints etc. but in the case of outgoing HttpClient requests there's less structured information available to infer which parts of the URL are parameters and which are not (or whether a parameter should be included or scrubbed for the purpose of grouping).

jamescrosswell avatar Sep 15 '24 21:09 jamescrosswell

I'm wondering how we could auto-parameterize URL though

For sure it's not going to be 100% perfect, but I think that this is one of those cases where "good enough is good enough" and "don't let perfect be the enemy of good". Start with a proposal like the following and then can adjust as user feedback is gained.

Here's what I would suggest:

  • Tokenize the called URL to replace all patterns of /{number}/? and /{UUID}/? and /{bool}/? with a string like ".../{param}/..."

This isn't perfect because it won't catch textual URL path params, but should cover 95% of cases* seen in the real world *data 100% fabricated ;)

ericsampson avatar Sep 16 '24 15:09 ericsampson

When using the SentryHttpMessageHandler, errors from all called URLs get lumped into one Issue. That's not great as the grouping is too eager; it's not very helpful to bundle HTTP errors from 25 different microservices and 15 external SaaS providers into a single Issue.

btw @ericsampson how is it that you end up with the same stack trace for calls to lots of different external URLs? Are you looking up the actual URL to be called dynamically (rather than specifying this in code) prior to making the Http request?

I'm mainly asking because I'd like to create a simple way to reproduce this problem.

jamescrosswell avatar Sep 26 '24 05:09 jamescrosswell

@jamescrosswell in our monolith, we have created tons of different external HTTP clients that all derive from a single base class HttpClientBase.cs that exposes methods like public async Task<Result<T>> GetAsync<T>(string url, ...), and the derived classes construct the required urls dynamically call these methods. That seems very normal to me.

The HttpClientBase.cs holds injected Microsoft HttpClient with an inner SentryHttpMessageHandler.

For every http call made by the HttClientBase that results in the SentryHttpFailedRequestHandler > response.EnsureSuccessStatusCode() throwing, the resulting Sentry events ALL get reported as a single sentry Issue with the following exact same grouping details as shown in the attached screenshot. Which is problematic for us, and the motivation for this ticket :)

We have not added any sentry dashboard custom Issue Grouping rules, so what we are seeing is stock Sentry behavior.

Does that help? Thanks!!

Image

ericsampson avatar Oct 02 '24 17:10 ericsampson

OK thanks @ericsampson. I think we can put together some code that will ensure the same stack trace when calling multiple different URLs, to simulate your scenario... From there it should be fairly simple to extend the Sentry SDK so that it has some callbacks or something on the options that you could use to customise the fingerprint for those outgoing URLs (and some default implementation of this that we provide out of the box).

jamescrosswell avatar Oct 07 '24 02:10 jamescrosswell

That sounds awesome, thanks James!

ericsampson avatar Oct 07 '24 23:10 ericsampson

@ericsampson is there a link to an issue in Sentry you you share? I wonder if perhaps the frames InApp flags are not preperly marked. Are the stack traces not including all frames leading up to the call site? Where your code started the HTTP request?

If not, I wonder if the stack trace is losing frames somehow.

bruno-garcia avatar Oct 08 '24 01:10 bruno-garcia

@bruno-garcia see this screenshot if it helps.

Image

You are correct, this is the entire Sentry raw stack from from the Issue in question; they are all identical regardless of of the application-code stack frame (and also identical regardless of the called URL/method):

System.Net.Http.HttpRequestException: Response status code does not indicate success: 500 (Internal Server Error).
  File "HttpResponseMessage.cs", line 165, in HttpResponseMessage HttpResponseMessage.EnsureSuccessStatusCode()
    throw new HttpRequestException(
  File "SentryHttpFailedRequestHandler.cs", line 27, in void SentryHttpFailedRequestHandler.DoEnsureSuccessfulResponse(HttpRequestMessage request, HttpResponseMessage response)
    response.EnsureSuccessStatusCode();

ericsampson avatar Apr 03 '25 19:04 ericsampson

I'm happy to provide a direct link to a relevant Sentry issue if I can do so privately rather than via this Github repo :) Cheers

ericsampson avatar Apr 03 '25 19:04 ericsampson

it's probably unrelated from a technical standpoint, but I was searching Sentry repos and found this one that was similar: https://github.com/getsentry/sentry-javascript/pull/14515

ericsampson avatar Apr 03 '25 19:04 ericsampson

looks to me as this is all we get from dotnet as a stack trace. I wonder if we force collecting a stack trace through new StackTrace(true) and stitch them together (or not, just use the one we created instead of the one from the exception) it would improve things.

@Flash0ver is this something you can give a quick try?

bruno-garcia avatar Apr 09 '25 12:04 bruno-garcia

@bruno-garcia is there anything that could be done to help bump this across the line? It's a real pain point for us, and I can't imagine how it is not affecting a ton of SDK users.

I could take a swipe at a PR if that would help, seems like a fairly simple change.

Cheers

ericsampson avatar May 23 '25 16:05 ericsampson

A PR would be really helpful! Thank you

bruno-garcia avatar May 23 '25 17:05 bruno-garcia

@bruno-garcia address with #4724, thanks!

ericsampson avatar Nov 11 '25 20:11 ericsampson

@ericsampson huge thanks for helping us making Sentry better ❤️ 🥔

You mentioned that a similar improvement could be made in Sentry.SentryGraphQLHttpFailedRequestHandler. Is this something that you would like to follow-up on? @jamescrosswell something we should follow-up on?

Flash0ver avatar Nov 20 '25 08:11 Flash0ver

@jamescrosswell something we should follow-up on?

Yes definitely, unless @ericsampson is really keen to do it.

jamescrosswell avatar Nov 20 '25 11:11 jamescrosswell

@jamescrosswell https://github.com/getsentry/sentry-dotnet/pull/4762

ericsampson avatar Nov 22 '25 16:11 ericsampson