opentelemetry-dotnet-contrib icon indicating copy to clipboard operation
opentelemetry-dotnet-contrib copied to clipboard

Geneva Exporter silently dropping an exception log record

Open darkoa-msft opened this issue 1 year ago • 3 comments

OpenTelemetry.Exporter.Geneva 1.6.0 OpenTelemetry.Extensions.Hosting Version 1.6.0 OpenTelemetry.Instrumentation.AspNetCore 1.5.1-beta.1 OpenTelemetry.Instrumentation.Http 1.5.1-beta.1

Target framework: net 6.0

Is this a feature request or a bug?

  • [ ] Feature Request
  • [x] Bug

What is the expected behavior?

I expect to see either the exception I am logging or a reason why it is not logged.

What is the actual behavior?

I see neither the exception nor the reason why it was not logged.

Additional Context

We are currently logging using both OTel and AI, AI logs a reason (payload too large for a single event), but OTel does not. We are logging using the common Logging extension (ILogger.LogError(ex, msg)).

darkoa-msft avatar Nov 14 '23 19:11 darkoa-msft

@darkoa-msft Are you still facing this issue? Could you collect the sdk logs and share the results? https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/README.md#self-diagnostics

vishweshbankwar avatar Jan 30 '24 16:01 vishweshbankwar

@vishweshbankwar yes, we still occasionally hit this issue. Specifically with a Cosmos DB exception where both message and stack trace are truncated. Looks to me that the ETW event size limit that you have in the code is not correct or not correctly applied. I did not check if you have a unit test for this.

Not sure how useful the SDK logs would be, I will try, though. The whole point was that Application Insights log this failure and OTel just silently drops the event.

darkoa-msft avatar Jan 30 '24 17:01 darkoa-msft

Note that we have seen the exceptions with max message and stack trace, so there is something else going on here.

darkoa-msft avatar Jan 30 '24 18:01 darkoa-msft

@darkoa-msft SDK internal logs would show if the sdk/exporter is running in to any issue.

vishweshbankwar avatar Feb 14 '24 04:02 vishweshbankwar

@vishweshbankwar Just skimming through the documentation, it looks like too much work for us to enable diagnostics. We cannot reproduce the exception that causes the issue, it is some internal Cosmos DB client exception.

Feel free to close this issue.

In the case you want to follow up internally, note that we added some logging around the silently dropped event and that the event dropped in a case when there was a ~22k stack trace and ~17k exception message. (We logged these strings separately, stack trace was truncated, exception message was not). Why would anyone have a 17k exception message is beyond me, but that is what it is.

Also, if it helps in any way - Application Insights failed to emit the event in the case of the logged exception message, exporter emitted this one fine. They both failed when logging the exception.

In short, we have a way of getting both the exception message and stack trace, so this is not an issue for us anymore. It is just a workaround, though.

darkoa-msft avatar Feb 14 '24 20:02 darkoa-msft

@darkoa-msft Your scenario is not completely clear to me. Feel free to reach out offline if you still have questions on geneva exporter limitations. Closing this.

vishweshbankwar avatar Feb 20 '24 16:02 vishweshbankwar

@vishweshbankwar Just skimming through the documentation, it looks like too much work for us to enable diagnostics.

This will be addressed by https://github.com/open-telemetry/opentelemetry-dotnet/issues/3881, targeting no later than Sep. 30th, 2024 if things go smoothly.

reyang avatar Feb 22 '24 19:02 reyang