tracing icon indicating copy to clipboard operation
tracing copied to clipboard

Per-callsite collect trait object

Open carllerche opened this issue 3 years ago • 4 comments

This RFC proposes altering the Collect trait to split out per-callsite operations into a second trait. Additionally, all thread-local operations are removed in favor of registering traits statically. Thread-local behavior can be layered on in tracing-subscriber.

trait RegisterCallsite: 'static {
    fn register_callsite(&self, metadata: &'static Metadata<'static>) -> DynCollect;
}

trait Collect: 'static {
    fn interest(&self, metadata: &'static Metadata<'static>) -> Interest;

    // Rest of the `Collect trait methods are identical, except there is no `reigster_callsite`
}

Note, all type names are mostly placeholder at this point. Naming suggestion are welcome.

Motivation

It is common for Collect implementations to perform logic based on the type of callsite being instrumented. This sometimes takes the form of a sequence of if blocks checking the Metadata when handling an event. For [Tokio Console], each event results in a linear scan.

Supporting per-callsite Collect traits would enable removing this overhead by enabling the Collect implementor to define separate trait implementation for each case. Theoretically, a single dynamic dispatch would be required for instrumentation.

Details

Each callsite will hold a single DynCollect instance. Think of DynCollect as Arc<dyn Collect> for now. The static callsite struct used by the macro will look something like:

struct MacroCallsite {
    collect: AtomicDynCollect,

    // The rest stays the same. Roughly....
    interest: AtomicUsize,
    meta: &'static Metadata<'static>,

}

where AtomicDynCollect is a wrapper around AtomicPtr with some added logic to handle the trait-object (more later). The macro will generate something like:

static CALLSITE: tracing_core::callsite::Entry<MacroCallsite> = default();

The Entry decorator is a node in an intrusive linked list tracking every known callsite in the system. When the callsite instance is first used, it is appended to the global linked list. This is similar to the current behavior.

The next step is different. After the callsite is registered with the global list, it acquires an instance of DynCollect by calling register_callsite on the global RegisterCallsite instance.

If the collector has no interest in the callsite (equivalent to today's Interest::Never), then a no-op DynCollect instance is returned. Otherwise, an instance of DynCollect specific to the callsite is returned by the RegisterCallsite. The DynCollect is stored in the static callsite instance using an atomic operation.

The process of instrumenting events is the same as today. The same methods are called.

DynCollect

The DynCollect is roughly equivalent to Arc<dyn Collect>. However, we would like to support storing a collect instance in a callsite using an atomic operation and avoid locking. This will help cases like rustc where the initial registration process carries significant total cost as callsites are often used only a handful of times. An Arc<dyn Collect> consists of two pointers: one to the data and one to the v-table. This means we cannot use an AtomicPtr.

To enable atomic operations for storing Collect instances, a different virtual dispatch strategy is used. The VTable is embedded as a header to the data. The DynCollect type looks like:

struct DynCollect {
    ptr: *const DynCollectHeader,
}

struct DynCollectHeader {
    ref_count: AtomicUsize,
    vtable: &'static DynCollectVtable,
}

This is similar to the struct layout and strategy used by Tokio tasks. More details can be found in that code

The downside of this strategy is there is no zero-cost conversion between Arc<dyn Collect> and DynCollect. That said, most tracing users use the Subscribe API and not the tracing-core::Collect API, so end user impact should be minimal.

Locking

Currently, the callsite registration process is synchronized using a lock similar to OnceCell. We could theoretically remove the lock if we are OK w/ optimistic registration. Concurrent first calls to a callsite could both call register_callsite() and only one would win the compare-and-swap. This may be confusing the user and benefits are not clear.

Subscribe API

The Subscribe trait in tracing-subscriber should also have a callsite specific trait. The split would be similar to RegisterCallsite and Collect, except that the register_callsite method probably could just return regular trait object like Arc<dyn Subscribe> or Box<dyn Subscribe>.

Open questions

All the naming is TBD. Also, more details might come to light during implementation. The proposal is mostly a high level sketch.

carllerche avatar Feb 04 '22 22:02 carllerche

Am I correct in understanding that this API doesn't prevent the usage of tracing-core in #[no_std] + no alloc environments? It seems like the construction of DynCollect only has dependencies on core and nothing on alloc.

davidbarsky avatar Feb 07 '22 18:02 davidbarsky

it does not prevent no_std cases w/o alloc.

carllerche avatar Feb 07 '22 19:02 carllerche

I'm definitely interested in this idea, as it seems like it has the potential to improve performance significantly when a collector wishes to implement specific behaviors only for a small subset of callsites.

I'm also open to investigating moving the thread local collector features out of tracing-core and into a higher-level crate. Handling the thread-local collectors introduces a lot of complexity that could otherwise be avoided in tracing-core. If we move that functionality out of core, we will make sure that enough APIs are exposed that it's possible to re-implement it as an opt-in feature in a higher-level crate, but I don't think this is going to be a significant issue.

My biggest concern with the per-callsite collector system is that it introduces a lot more complexity that has to be communicated to users; the tracing APIs already feel very complex, so if we introduce more moving parts, we should be careful to make sure people can figure out how to use them. Ideally, we would hope that a simple Subscribe implementation, that doesn't want to have specific behavior for particular callsites, can be implemented without having to know about a per-callsite API, and that the added complexity is only exposed to the user when they want to implement callsite-specific behavior. I'm not sure if this is actually possible, but it would be ideal.

It definitely seems like the next step is to actually try a prototype implementation, to make sure there are no obvious limitations we haven't thought of here!

hawkw avatar Feb 07 '22 22:02 hawkw

I'm also open to investigating moving the thread local collector features out of tracing-core and into a higher-level crate.

For this, does the below sound like a viable gameplan:

  1. Remove from tracing-core:
    • dispatch::get_default
    • dispatch::set_default
  2. Introduce to tracing:
    • collect::PerThreadCollector, a global Collector that delegates to thread-local Collects
  3. Change in tracing:
    • collect::set_default only has effect if PerThreadCollector is global collector
    • collect::with_default only has effect if PerThreadCollector is global collector

jswrenn avatar May 17 '22 16:05 jswrenn