sentry-java icon indicating copy to clipboard operation
sentry-java copied to clipboard

`DiagnosticCoroutineContextException` causing grouping/titling issues

Open lobsterkatie opened this issue 7 months ago • 12 comments

Description

This is split off of https://github.com/getsentry/sentry/issues/84502, to specifically talk about the DiagnosticCoroutineContextException errors mentioned there and the way they're affecting grouping and titling of issues.

The main problems reported there:

  • The DiagnosticCoroutineContextException errors have no stacktrace and no message, and so we're ending up grouping on error type only, which is putting chains with different ~inner~ outer errors into the same group because their ~outer~ inner errors are DiagnosticCoroutineContextException in both cases. (This was the original inspiration for the issue.)
  • DiagnosticCoroutineContextException is being used as the title for different groups, making them hard to differentiate in the issue stream. (See comment here.)

IIRC, this has come up before, and we've gone back and forth (on GH and in Slack) about what should be the main exception and what should be suppressed, and whether the solution lies in the SDK or on the server, and AFAIK we didn't really come to a firm conclusion and make a plan. Since it's now come up again, I figured maybe we could centralize the conversation in one spot and get to a decision on how we want to handle things. I'll add my understanding of where things stand as a comment below.

Just to have an example to work with, let's consider the following event data:

{
  "exception": {
    "values": [
      {
        "type": "IllegalArgumentException",
        "value": "View=DecorView[Main] not attached to window manager",
        "module": "java.lang",
        "stacktrace": {...},
        "thread_id": 2,
        "mechanism": {
          "type": "UncaughtExceptionHandler",
          "handled": false,
          "is_exception_group": true,
          "exception_id": 0
        }
      },
      {
        "type": "DiagnosticCoroutineContextException",
        "module": "kotlinx.coroutines.internal",
        "thread_id": 2,
        "mechanism": {
          "type": "chained",
          "exception_id": 1,
          "parent_id": 0
        }
      }
    ]
  },
  "main_exception_id": 1,
  "title": "DiagnosticCoroutineContextException"
}

On the server side, what happens is this:

  • We see that the IllegalArgumentException is marked as an exception group, with the DiagnosticCoroutineContextException as its only child , so we a) ignore IllegalArgumentException for the purposes of grouping, b) therefore group only on the DiagnosticCoroutineContextException, and within that only group on error type, because there's no stacktrace and no error message, and c) set main_exception_id to point to the DiagnosticCoroutineContextException.
  • When it comes to titling the event, we see main_exception_id = 1 and therefore use DiagnosticCoroutineContextException as the title (again only using the error type because there's no message).

Random thoughts/questions:

  • A user in the original issue solved this by simply deleting the DiagnosticCoroutineContextException in their beforeSend. I presume it has some value, though, right? So that's not a great solution in general?
  • From what I recall, the point of marking it as an exception group is to get the UI to behave the way we want, correct?
  • Things would be different if there were multiple child exceptions of different types. Does that ever happen?
  • One way to handle this would just be to special-case it on the server. Are there other examples of things like this, where it's "real error marked as a group and as the parent of a generic, low-info error?" And is that generic, low-info error always a DiagnosticCoroutineContextException?

lobsterkatie avatar May 15 '25 18:05 lobsterkatie

multiple child exceptions of different types. Does that ever happen?

Yes this can happen. There's no limits as to what types / combinations thereof can be in suppressed exceptions. It can be any combination.

A user in the original issue solved this by simply deleting the DiagnosticCoroutineContextException in their beforeSend. I presume it has some value, though, right? So that's not a great solution in general?

Ideally the customer doesn't have to remove anything by hand but SDK / Sentry Backend figure it out.

@markushi @romtsn should we ignore DiagnosticCoroutineContextException in the SDK (when it's a suppressed exception)?

From what I recall, the point of marking it as an exception group is to get the UI to behave the way we want, correct?

Yes, it would show the list of suppressed exceptions on top of the issues page. We do not set the is_exception_group flag at the moment. It caused problems with the wrong exception being used as the main exception if there was only a single exception with a single suppressed exception.

Is the sample event above coming from an outdated SDK version? If so upgrading the SDK should change the behaviour. Something odd is going on here. According to the linked GH issue, v8 of the SDK is used. None of the final v8 Java SDK versions should be sending the is_exception_group flag as we removed it in https://github.com/getsentry/sentry-java/pull/4056. Kotlin SDK also doesn't seem to set it.

adinauer avatar May 19 '25 08:05 adinauer

Something odd is going on here. According to the linked GH issue, v8 of the SDK is used. None of the final v8 Java SDK versions should be sending the is_exception_group flag as we removed it in https://github.com/getsentry/sentry-java/pull/4056. Kotlin SDK also doesn't seem to set it.

@M-Wong Can you please look at the JSON for one of the events with the bad titles to see if is_exception_group is included there the way it is in the sample event at the top of this issue?

lobsterkatie avatar May 19 '25 19:05 lobsterkatie

@lobsterkatie here's an event from our sentry-sdks project that demonstrates the behavior: https://sentry-sdks.sentry.io/issues/6606969586/events/3c60c1e5d1584fa0b8338651bda22408/

We don't set is_exception_group , and main_exception_id is set to 0 (UncaughtExceptionHandler), however the title still gets inferred from DiagnosticCoroutineContextException

romtsn avatar May 19 '25 20:05 romtsn

Something odd is going on here. According to the linked GH issue, v8 of the SDK is used. None of the final v8 Java SDK versions should be sending the is_exception_group flag as we removed it in #4056. Kotlin SDK also doesn't seem to set it.

@M-Wong Can you please look at the JSON for one of the events with the bad titles to see if is_exception_group is included there the way it is in the sample event at the top of this issue?

Hi @lobsterkatie I've done the same crash twice in our project, once with the Sentry KMP 0.11.0 and once with version 0.12.0 (both times with the Sentry Android Gradle plugin 5.4.0 as well). Here are the JSON values for the exceptions:

Version 0.11.0

"exception":{
   "values":[
      {
         "type":"RuntimeException",
         "value":"Debug Crash",
         "module":"java.lang",
         "stacktrace":{...},
         "raw_stacktrace":{...},
         "thread_id":2,
         "mechanism":{
            "type":"UncaughtExceptionHandler",
            "handled":false
         }
      }
   ]
},

Version 0.12.0

"exception":{
   "values":[
      {
         "type":"RuntimeException",
         "value":"Debug Crash",
         "module":"java.lang",
         "stacktrace":{...},
         "raw_stacktrace":{...},
         "thread_id":2,
         "mechanism":{
            "type":"UncaughtExceptionHandler",
            "handled":false,
            "exception_id":0
         }
      },
      {
         "type":"DiagnosticCoroutineContextException",
         "module":"kotlinx.coroutines.internal",
         "thread_id":2,
         "mechanism":{
            "type":"suppressed",
            "exception_id":1,
            "parent_id":0
         }
      }
   ]
},

I can't find the is_exception_group property anywhere in the JSON. Let me know if you need more information.

M-Wong avatar May 20 '25 08:05 M-Wong

@lobsterkatie as Roman pointed out, sentry-java starts sending exception_id, parent_id and is_exception_group with SDK version >= 8.0.0, could this be the root cause? IIRC there was some ordering logic which only gets applied when exception ids are set. (Commit here)

We don't set is_exception_group , and main_exception_id is set to 0 (UncaughtExceptionHandler), however the title still gets inferred from DiagnosticCoroutineContextException

markushi avatar Jun 11 '25 13:06 markushi

@markushi As of v8 final release, we're not setting is_exception_group anymore, see https://github.com/getsentry/sentry-java/pull/4056

Case 1

It seems that the last exception in the exception.values array is what determines the title, regardless of main_exception_id, and this is what is causing titling issues.

This issue shows evidence that this is the case: https://sentry-sdks.sentry.io/issues/6682710751/events/?project=5428563&query=is%3Aunresolved&referrer=issue-stream&stream_index=0

Main exception last in the array:

{
  "main_exception_id": 0,
  "exception": {
    "values": [
      {
        "type": "ArithmeticException",
        "value": "suppressed exception",
        "module": "java.lang",
        "stacktrace": {
          "frames": [
            "..."
          ]
        },
        "raw_stacktrace": {
          "frames": [
            "..."
          ]
        },
        "thread_id": 64,
        "mechanism": {
          "type": "suppressed",
          "exception_id": 1,
          "parent_id": 0
        }
      },
      {
        "type": "RuntimeException",
        "value": "main exception",
        "module": "java.lang",
        "stacktrace": {
          "frames": [
            "..."
          ]
        },
        "raw_stacktrace": {
          "frames": [
            "..."
          ]
        },
        "thread_id": 64,
        "mechanism": {
          "type": "chained",
          "exception_id": 0
        }
      }
    ]
  }
}

Result: title is RuntimeException ....

Suppressed exception last in the array:

{
  "main_exception_id": 0,
  "exception": {
    "values": [
      {
        "type": "RuntimeException",
        "value": "main exception",
        "module": "java.lang",
        "stacktrace": {
          "frames": [
            "..."
          ]
        },
        "raw_stacktrace": {
          "frames": [
            "..."
          ]
        },
        "thread_id": 64,
        "mechanism": {
          "type": "chained",
          "exception_id": 0
        }
      },
      {
        "type": "ArithmeticException",
        "value": "suppressed exception",
        "module": "java.lang",
        "stacktrace": {
          "frames": [
            "..."
          ]
        },
        "raw_stacktrace": {
          "frames": [
            "..."
          ]
        },
        "thread_id": 64,
        "mechanism": {
          "type": "suppressed",
          "exception_id": 1,
          "parent_id": 0
        }
      }
    ]
  }
}

Result: title is ArithmeticException ...

We have just changed the ordering of the exceptions between the two, and the title changed, even though they are in the same group, which is correct.

Case 2

In this case, we have sent the same exception, with the suppressed exception as the last one in the array, but we have removed the stack trace from the suppressed exception: https://sentry-sdks.sentry.io/issues/6626220658/events/65e7cfb855c34002849e8063cc7f2454/events/

{
  "main_exception_id": 0,
  "exception": {
    "values": [
      {
        "type": "RuntimeException",
        "value": "main exception",
        "module": "java.lang",
        "stacktrace": {
          "frames": [
            "..."
          ]
        },
        "raw_stacktrace": {
          "frames": [
            "..."
          ]
        },
        "thread_id": 64,
        "mechanism": {
          "type": "chained",
          "exception_id": 0
        }
      },
      {
        "type": "ArithmeticException",
        "value": "suppressed exception",
        "module": "java.lang",
        "thread_id": 64,
        "mechanism": {
          "type": "suppressed",
          "exception_id": 1,
          "parent_id": 0
        }
      }
    ]
  }
}

This caused the error to show up in a different (incorrect) group

Image

Regarding this group: The stack traces are the same as we're capturing both exceptions in the same function. However, the exception values are different and the latest event carries a suppressed exception, while previous events don't. So, I would expect different grouping.

Note that the issue title is not changed to ArithmeticException though. Perhaps because it doesn't have a stack trace?

@lobsterkatie

  • Problem 1: Maybe we could make changes to the backend so that the main_exception_id is what determines the title, instead of the last exception in the array?
  • Problem 2: It's not clear why not having a stack trace on the suppressed exception causes it to be grouped differently. Maybe we could improve this?

@markushi @romtsn Do you know if/why the list of exceptions appears to be reversed in some cases?

lcian avatar Jun 16 '25 09:06 lcian

@lcian is there any traction here? Our top exceptions are all suffering from this issue and it makes it really hard to see which is which.

ansman avatar Sep 14 '25 12:09 ansman

@lobsterkatie Do you have an answer for the questions in https://github.com/getsentry/sentry-java/issues/4410#issuecomment-2975843395 ?

To summarize, I think the solution to this would be to consider main_exception_id and, if present, use that exception to determine the title, instead of always defaulting to the last one in exception.values (which seems to be the current behavior).

However, I don't know if this could have any side effects we're not considering.

lcian avatar Sep 15 '25 08:09 lcian

So for errors we do use main_exception_id for titling (see here, where the main_exception_id is used in _find_main_exception, that exception is used for the metadata's type and value, and compute_title uses type and value from the metadata to put together the title).

Thus the question is what counts as the main exception. By default, if there's one root exception (as is the case with any chain), we just take the last one. We do have the ability to override that, though, which we do in specific known cases. I'm not sure if suppressed is a standard value for mechanism.type, but it's one we could certainly add logic to look for.

FWIW, this is what is_exception_group was doing originally, when you were sending it, though it has the effect of causing grouping to fully ignore the exception it's attached to, rather than just skipping over it for titling purposes. Perhaps we should talk about going back to that? I think one thing which might be messing us up though is ordering and what counts as the parent and what counts as the kid. Our assumption in the backend is that the exception groups we want to ignore are the outermost (last, parent, exception id 0) errors. (See [this test input, which ends up with main_exception_id: 1 and title 'MyApp.Exception: Test 1`.)

In the case that originally spawned this thread, though, the DiagnosticCoroutineContextException (the kid) was last. Also, I believe the kid was the one we wanted to ignore, which isn't normally how it goes.

So I guess the questions at this point are:

  • Which exception is the one that should be ignored?
  • Should it be ignored just for the purposes of the title, or should it be ignored for grouping, too?
  • Is the ignored exception one which is the root of the chain, or is it itself wrapped by and/or the cause of a more useful error?

Let's get those questions answered and we can decide what combo of SDK changes and/or backend changes makes the most sense.

lobsterkatie avatar Sep 17 '25 22:09 lobsterkatie

I can’t answer those questions alone, but I can give context about the exception itself. It’s added by Coroutines to aid debugging. It does have some useful information but I don’t think it’s available in the message so for this purpose that exception is useless

ansman avatar Sep 17 '25 22:09 ansman

@ansman For the time being, you can fix this by using a beforeSend such as this one:

    Sentry.init(
        options -> {
          options.setBeforeSend(
            ((event, hint) ->  {
              List<SentryException> exceptionList = event.getExceptions();
              int n = exceptionList.size();
              if (n == 2) {
                SentryException exception = exceptionList.get(n - 1);
                if (exception.getType() != null && exception.getType().equals("DiagnosticCoroutineContextException")) {
                  ArrayList<SentryException> copy = new ArrayList<>(exceptionList);
                  Collections.reverse(copy);
                  event.setExceptions(copy);
                }
              }
              return event;
            })
          );

I've special cased it to n == 2 as that will avoid the possibility of creating new groups, which you probably don't want to risk.

lcian avatar Sep 18 '25 14:09 lcian

Thanks @lobsterkatie for having another look into this. Sorry to sidestep your questions, but I believe I found the actual root cause we need to address, probably to be addressed on the SDK side.

First, I want to point out that the SDK handles suppressed exceptions correctly. See this error: we always attach the suppressed exception(s) as first in the exception.values array. This ensures that the "main" exception (RuntimeException in this case) ends up last in that array, which causes it to determine the title, which is what we want.

If you look at the problematic issues from @ansman in the support ticket you will see that all of them contain com.android.internal.os.RuntimeInit$MethodAndArgsCaller.

This is a special frame that we look out for in the default Android Event Processor. Specifically, we reverse the exception.values array if that specific frame is present anywhere. https://github.com/getsentry/sentry-java/blob/3d205d0cc3fbcc85fbab1d74d6e2ab0930ddf031/sentry-android-core/src/main/java/io/sentry/android/core/DefaultAndroidEventProcessor.java#L111

As a consequence, the DiagnosticCoroutineContextException ends up last in the array, and determines the title.

From the blame I can see this was added as a quick fix to prevent another titling/grouping issue. I think we could special case fixExceptionOrder to not perform the reversing if DiagnosticCoroutineContextException is the first in the chain, as it seems a pretty common case (at least in this customer's project). What do you think @markushi @romtsn ?

However that could change grouping, so we could also do this for n = 2 for the time being and then for any n in the next major. (Note that luckily DiagnosticCoroutineContextException doesn't carry a stack trace, so it should be irrelevant in the context of grouping just by itself. But of course if you reverse 2+ exceptions with stack trace I imagine it would create a new group).

lcian avatar Sep 18 '25 14:09 lcian