tracing-opentelemetry
tracing-opentelemetry copied to clipboard
Possibly incorrect `impl Layer for MetricsLayer`
Bug Report
Description
The implementation of MetricsLayer seems subtly incorrect to me. I don't think MetricsLayer should be implementing register_callsite or enabled because these methods determine whether a span or event is globally enabled (as per the docs) — which is not the behaviour I think we want.
impl<S> Layer<S> for MetricsLayer<S> where S: Subscriber + for<'span> LookupSpan<'span> {
...
fn register_callsite(&self, metadata: &'static Metadata<'static>) -> Interest {
self.inner.register_callsite(metadata)
}
fn enabled(&self, metadata: &Metadata<'_>, ctx: Context<'_, S>) -> bool {
self.inner.enabled(metadata, ctx)
}
...
}
Am I missing something here? Or should these impls be removed?
The self.inner is Filtered which is used for per-layer-filters.
This will not globally disable any spans or events, just set up filters for this layer.
This will not globally disable any spans or events, just set up filters for this layer.
That is not what the docs seem to be saying:
Note: This method (and Layer::register_callsite) determine whether a span or event is globally enabled, not whether the individual layer will be notified about that span or event. This is intended to be used by layers that implement filtering for the entire stack.
related issue https://github.com/tokio-rs/tracing-opentelemetry/issues/111#issue-2163869781
Yeah, the docs are right, but the implementation here always returns true for enabled and never returns Interest::never() for register_callsite.
Check the Filtered implementation of these methods.
I see! That is...very confusing 🙂
Thanks for clarifying @mladedav.
If the implementation always returns true for enabled then could we just delete the method and use the default implementation that also always returns true? Same for register_callsite could we just delete it and use the default implementation? Less code seems more clear here.
No, it has side effects. I don't understand 100 % of the code around per-layer-filters so I hope I don't mistify you here. But the main points should be true.
What happens in the background is each per-layer filter (filters which are inside Filtered) states whether it's interested in the event through a shared static but returns true because it doesn't want to globally disable the event regardless of whether it is interested or not. (This is not precise, the Filtered could also have a global filter inside like EnvFilter which could make it disable it globally - that's not the case here though)
Then the Registry checks the global state and if all filters disabled the event, it consideres it disabled.
This maybe explains it better than I do?
Basically if we remove the two methods, the filtering will not work and we'd have to get rid of the whole Filtered as I think it would filter out everything.
Removing the whole Filtered could be considered. The InstrumentLayer would then have to handle all events, even those that do not contain any metrics. From what I've seen it already could handle it but my (unfounded) guess is that using the filtering like this could slightly improve performance. Someone could try running some benchmarks to see if it's true or not. And also checking if I've maybe overlooked something.
Thanks again for the very thorough reply @mladedav. I'm happy for this issue to be closed unless you want to track the potential work you mentioned at the end of your reply.