ApplicationInsights-Java icon indicating copy to clipboard operation
ApplicationInsights-Java copied to clipboard

XML/SB/Agent configuration for exceptions truncation

Open vlev opened this issue 5 years ago • 14 comments

Fix #804 .

Not sure if this contribution should be considerate a significant one. If it's required to follow some extra procedures, please let me know.

vlev avatar Jan 22 '19 13:01 vlev

CLA assistant check
All CLA requirements met.

msftclas avatar Jan 22 '19 13:01 msftclas

@reyang @littleaj could you please help reviewing this contribution. In short, it is to improve our exception capturing ability by triming down the stack trace size to what is supported by backend. @vlev thanks a lot for your contribution much appreciated.

dhaval24 avatar Jan 24 '19 06:01 dhaval24

@vlev a quick comment on naming first. I would rather prefer something different than traceLength. trace is common terminology in Distributed Tracing (a.k.a correlation) and also is sometimes intermixed as logs. This would create yet another confusing.

May be exceptionTraceLength or something of that sort is better alternative. @reyang your thoughts on this?

PS: Hardcoding is not what I would prefer. Instead, customer configuration is a good choice. However, it still wont solve all the problems and some packets mights still drop because customer misconfigured it. In this case, it might be wise to log a message for troubleshooting when the backend responds with partial accept.

dhaval24 avatar Jan 24 '19 15:01 dhaval24

@vlev to give you some more context, we need to think a bit about this design overall. It might be a good idea to have these properties configurable, however it still leaves a gap for users to make this call and it can well end up that they might misconfigure these properties and hence the telemetry doesn't end up in backend.

We also need to ensure that the problem is solved ubiquitously across languages. For example I am not sure there is a similar approach taken in C# or other Application Insights SDK. @SergeyKanzhelev what are your thoughts and how does C# handle this?

We need to understand a bit deeply wether we needed adjustments in our backend to handle large exceptions stack traces properly, wether we need a backend adjustment. or if we need to do it on SDK side we need to ensure design is consistent across SDKs.

I would put the PR on a small hold so we can have a better design discussion here on Github. @vlev much appreciate that you brought light to this topic again.

dhaval24 avatar Jan 25 '19 05:01 dhaval24

The design seems to be good, however naming, and some breaking changes needs to be addressed at the very least I think.

I've reverted all breaking changes and renamed traceLength to exceptionTraceLength.

vlev avatar Jan 28 '19 16:01 vlev

@vlev to give you some more context, we need to think a bit about this design overall. It might be a good idea to have these properties configurable, however it still leaves a gap for users to make this call and it can well end up that they might misconfigure these properties and hence the telemetry doesn't end up in backend.

Well, I agree - this extra configuration can not be treated as a final solution to #482. Nevertheless, it could provide some control and confidence in AI tracing capabilities. Also, this configuration might be useful for those who want to apply some constrains on the amount of data being sent and stored in AI backend. For instance, some legacy applications might generate quite a big amount of white noise. Which could be reflected in storage account bills.

vlev avatar Jan 28 '19 16:01 vlev

Also, this configuration might be useful for those who want to apply some constrains on the amount of data being sent and stored in AI backend.

Regarding this, yes it is a perfectly valid scenario. However, we generally recommend users to implement their own class which implements TelemetryProcessor interface and you can then write your custom logic to trim the stacktrace. It might provide more flexibility for users to choose their own way of trimming exceptions. May be they want to trim stacktrace from top or bottom or take parts of top and bottom and trim in the middle.

I am fine with either decision, but to make this solution a generic solution, we need to have consistency across different AI SDKs so that we do not confuse users, to adopt to one solution in one language and another solution in another.

I know this might sound a bit stringent, but once a configuration change is introduced it remains in the users configuration file for decade and it's very hard to revert it. So I do want to proceed with caution here.

dhaval24 avatar Jan 29 '19 04:01 dhaval24

@dhaval24 Could you please tell, if there are any updates on this issue? Do you guys have some timeline in mind?

Or, maybe, you need some additional input from my side?

vlev avatar Feb 21 '19 08:02 vlev

@SergeyKanzhelev do we face similar issue in .NET SDK? If yes, how did we resolved that? If not, what are your thoughts on above approach from @vlev

dhaval24 avatar Feb 23 '19 00:02 dhaval24

@SergeyKanzhelev, @dhaval24 Any news on the issue?

vlev avatar May 06 '19 06:05 vlev

@SergeyKanzhelev do we know if similar issue is problem for .Net SDK?

Current behaviour in Java is very worrying and basically means AppInsights is useless as centralised logs. What's the point of having those when I cannot be sure I have all exceptions there?

We already had 2 cases in our recent production when important issue was missed (our application happens to have huge stack traces).

@dhaval24 do we have any examples of how we should implement our own TelemetryProcessor to fix the issue above?

gondar avatar May 06 '19 15:05 gondar

@trask FYI - this issue persist in our SDK ( where huge stack traces are not sent due to schema limits). This is a design question in general and @reyang (If I remember correctly) also had some thoughts about it.

It depends on the type of application. Assuming it is a springboot type application you can easily wire up a processor like this:

@Bean
    public TelemetryProcessor myProcessor() {
        return (Telemetry t) -> {
            if (t instanceof ExceptionTelemetry) {
                // do some manipulation of stacktrace ...
            }
            // always return true, as there is no filtering.
            return true;
        };
    }

Springboot application will then wire up this processor. If it's java ee app similar logic exist.

@littleaj / @trask can assist more on it.

dhaval24 avatar May 06 '19 16:05 dhaval24

Thanks, I sort of know the TelemetryProcessor syntax. But getting telemetry smaller was a bit trickier. Below my implementation:

public static final int MAX_STACK_LENGTH = 100;
public static final int MAX_INNER_EXCEPTIONS = 10;
public static final int MAX_INNER_EXCEPTION_STACK_LENGTH = 10;
@Override
public boolean process(Telemetry telemetry) {
    if (telemetry instanceof ExceptionTelemetry) {
        Throwable throwable = ((ExceptionTelemetry) telemetry).getThrowable();
        cutStackTraceIfNeeded(throwable, MAX_STACK_LENGTH);
        ((ExceptionTelemetry) telemetry).setException(throwable, MAX_INNER_EXCEPTIONS);
    }
    return true;
}

private void cutStackTraceIfNeeded(Throwable throwable, int maxStackLength) {
    if (throwable == null)
        return;
    StackTraceElement[] stack = throwable.getStackTrace();
    if (stack.length > maxStackLength) {
        StackTraceElement[] throttledStack = Arrays.copyOf(stack, maxStackLength);
        throwable.setStackTrace(throttledStack);
    }
    cutStackTraceIfNeeded(throwable.getCause(), MAX_INNER_EXCEPTION_STACK_LENGTH);
}

Let me know if you have some ideas of what I could possibly miss. What I dislike about this implementation the most is that transforming of throwable stackTrace happens twice: first in logsappender and than in my processor. That's going to be performance hit ;-/.

gondar avatar May 07 '19 21:05 gondar

@dhaval24 Should I cancel this pull request?

Looks like you've decided to stick with the workaround. Could you please include this into the documentation? Each of the projects using AI would need to include it into their source code.

vlev avatar Aug 08 '19 06:08 vlev

closing, addressed by #1702

trask avatar Aug 15 '22 21:08 trask