opentelemetry-dotnet
opentelemetry-dotnet copied to clipboard
Review and audit of EventSource logging for self-diagnostics
OpenTelemetry .NET uses EventSource
for internal logging. Each component defines an EventSource
including the API, SDK, and each of the exporters and instrumentation components.
The purpose of this issue is threefold:
-
Review the usage of these
EventSource
's for correctness in what gets logged and where it gets logged.
Example of incorrectness:BaseExporter.Shutdown
logs a span processor related error when there is a failure. This error should not be about spans or processors:
https://github.com/open-telemetry/opentelemetry-dotnet/blob/635028834c7d435bc64dd64510e4f7b7ec4207a4/src/OpenTelemetry/BaseExporter.cs#L123-L127
- Improve coverage of diagnostic logging by identifying gaps.
Example: Sometimes (but not always) we have TODOs throughout the code highlighting gaps where a diagnostic log may be useful, but has not been implemented yet. Here is an example where an attempt to update a metric has failed: https://github.com/open-telemetry/opentelemetry-dotnet/blob/db8f0e712555716932e323f5fc7c301b17ec4c11/src/OpenTelemetry/Metrics/AggregatorStore.cs#L292
- Aim to make diagnostic logging actionable
Based on this comment https://github.com/open-telemetry/opentelemetry-dotnet/pull/2525#discussion_r737777141, we should seek to provide actionable guidance when there are errors. For example, if an error is due to misconfiguration, then the log message should give an indication for how to resolve the error.
Related in-flight work
The following PR was held off until the post-1.0 release. It may be beneficial to land this work prior to the review of self-diagnostics:
https://github.com/open-telemetry/opentelemetry-dotnet/pull/1529
References
OpenTelemetry general error handling guidelines/self-diagnostics OpenTelemetry .NET self-diagnostics guide