Proposal to improve internal error logging
Presently, an internal logging mechanism is in place that leverages eprintln! to alert users of errors within the Opentelemetry pipeline. Users have the option to customize this logging by introducing their own error handler.
However, this existing approach exhibits a couple of issues:
- Absence of granularity in internal logs, making it challenging for users to isolate valuable information.
- Insufficient metadata, which makes error messages less informative for users.
To improve user-friendliness of our internal error handling, I am proposing the development of an enhanced internal logging tool with the following attributes:
- Minimal reliance on
opentelemetry. Given thatopentelemetryis typically used for diagnosing problems in applications and other libraries, it's important that we avoid using its log pillar for error reporting. - Different levels for the internal logs. As telemetry should remain stable and never panic, the maximum error level should be capped at ERROR. Additionally, levels such as WARNING and DEBUG can be introduced.
- Inclusion of metadata to aid users in issue identification and resolution. Key metadata, in my view, includes:
- Pillars (traces vs metrics vs logs)
- Components, which could differ across various pillars. For instance, trace should contain
SpanProvider,SpanProcessor,SpanExporter.
- To help user filter different level of messages. We should provide some kind of configuration to set the maximum log level. log creates' compile time filters is a good example and we can provide something similar
To implement these enhancements, I think we should convert the current Error enum into an Error structure that will include all the aforementioned metadata and level information. So something like
pub struct Error {
level: LogLevel,
pillar: Pillar,
component: Component,
error: String,
}
It might be worth seeing if we can leverage the tokio tracing crate for this implementation. It already defines concepts of log levels and components (which it calls targets) and has a rich facility for filtering which logs go where. I realize there might appear to be a circular dependency between their crate and opentelemetry, but I suspect it might be possible to avoid if we (and they) have api vs. sdk crates and our sdk only depends on their api and vice versa.