tracing-opentelemetry icon indicating copy to clipboard operation
tracing-opentelemetry copied to clipboard

Possibly incorrect `impl Layer for MetricsLayer`

Open calebschoepp opened this issue 1 year ago • 8 comments

Bug Report

Description

The implementation of MetricsLayer seems subtly incorrect to me. I don't think MetricsLayer should be implementing register_callsite or enabled because these methods determine whether a span or event is globally enabled (as per the docs) — which is not the behaviour I think we want.

impl<S> Layer<S> for MetricsLayer<S> where S: Subscriber + for<'span> LookupSpan<'span> {
    ...
    fn register_callsite(&self, metadata: &'static Metadata<'static>) -> Interest {
        self.inner.register_callsite(metadata)
    }

    fn enabled(&self, metadata: &Metadata<'_>, ctx: Context<'_, S>) -> bool {
        self.inner.enabled(metadata, ctx)
    }
    ...
}

Am I missing something here? Or should these impls be removed?

calebschoepp avatar Jun 14 '24 20:06 calebschoepp

The self.inner is Filtered which is used for per-layer-filters.

This will not globally disable any spans or events, just set up filters for this layer.

mladedav avatar Jun 14 '24 22:06 mladedav

This will not globally disable any spans or events, just set up filters for this layer.

That is not what the docs seem to be saying:

Note: This method (and Layer::register_callsite) determine whether a span or event is globally enabled, not whether the individual layer will be notified about that span or event. This is intended to be used by layers that implement filtering for the entire stack.

lann avatar Jun 14 '24 23:06 lann

related issue https://github.com/tokio-rs/tracing-opentelemetry/issues/111#issue-2163869781

ymgyt avatar Jun 15 '24 02:06 ymgyt

Yeah, the docs are right, but the implementation here always returns true for enabled and never returns Interest::never() for register_callsite.

Check the Filtered implementation of these methods.

mladedav avatar Jun 15 '24 07:06 mladedav

I see! That is...very confusing 🙂

lann avatar Jun 15 '24 15:06 lann

Thanks for clarifying @mladedav.

If the implementation always returns true for enabled then could we just delete the method and use the default implementation that also always returns true? Same for register_callsite could we just delete it and use the default implementation? Less code seems more clear here.

calebschoepp avatar Jun 17 '24 16:06 calebschoepp

No, it has side effects. I don't understand 100 % of the code around per-layer-filters so I hope I don't mistify you here. But the main points should be true.

What happens in the background is each per-layer filter (filters which are inside Filtered) states whether it's interested in the event through a shared static but returns true because it doesn't want to globally disable the event regardless of whether it is interested or not. (This is not precise, the Filtered could also have a global filter inside like EnvFilter which could make it disable it globally - that's not the case here though)

Then the Registry checks the global state and if all filters disabled the event, it consideres it disabled.

This maybe explains it better than I do?

Basically if we remove the two methods, the filtering will not work and we'd have to get rid of the whole Filtered as I think it would filter out everything.

Removing the whole Filtered could be considered. The InstrumentLayer would then have to handle all events, even those that do not contain any metrics. From what I've seen it already could handle it but my (unfounded) guess is that using the filtering like this could slightly improve performance. Someone could try running some benchmarks to see if it's true or not. And also checking if I've maybe overlooked something.

mladedav avatar Jun 17 '24 19:06 mladedav

Thanks again for the very thorough reply @mladedav. I'm happy for this issue to be closed unless you want to track the potential work you mentioned at the end of your reply.

calebschoepp avatar Jun 19 '24 22:06 calebschoepp