tracing
tracing copied to clipboard
Per-callsite collect trait object
This RFC proposes altering the Collect
trait to split out per-callsite
operations into a second trait. Additionally, all thread-local operations are
removed in favor of registering traits statically. Thread-local behavior can be
layered on in tracing-subscriber
.
trait RegisterCallsite: 'static {
fn register_callsite(&self, metadata: &'static Metadata<'static>) -> DynCollect;
}
trait Collect: 'static {
fn interest(&self, metadata: &'static Metadata<'static>) -> Interest;
// Rest of the `Collect trait methods are identical, except there is no `reigster_callsite`
}
Note, all type names are mostly placeholder at this point. Naming suggestion are welcome.
Motivation
It is common for Collect
implementations to perform logic based on the type of
callsite being instrumented. This sometimes takes the form of a sequence of if
blocks checking the Metadata
when handling an event. For [Tokio Console], each
event results in a linear
scan.
Supporting per-callsite Collect
traits would enable removing this overhead by
enabling the Collect
implementor to define separate trait implementation for
each case. Theoretically, a single dynamic dispatch would be required for
instrumentation.
Details
Each callsite will hold a single DynCollect
instance. Think of DynCollect
as
Arc<dyn Collect>
for now. The static callsite struct used by the macro
will look something like:
struct MacroCallsite {
collect: AtomicDynCollect,
// The rest stays the same. Roughly....
interest: AtomicUsize,
meta: &'static Metadata<'static>,
}
where AtomicDynCollect
is a wrapper around AtomicPtr
with some added logic
to handle the trait-object (more later). The macro will generate something like:
static CALLSITE: tracing_core::callsite::Entry<MacroCallsite> = default();
The Entry
decorator is a node in an intrusive linked list tracking every known
callsite in the system. When the callsite instance is first used, it is appended
to the global linked list. This is similar to the current behavior.
The next step is different. After the callsite is registered with the global
list, it acquires an instance of DynCollect
by calling register_callsite
on
the global RegisterCallsite
instance.
If the collector has no interest in the callsite (equivalent to today's
Interest::Never
), then a no-op DynCollect
instance is returned. Otherwise,
an instance of DynCollect
specific to the callsite is returned by the
RegisterCallsite
. The DynCollect
is stored in the static callsite instance
using an atomic operation.
The process of instrumenting events is the same as today. The same methods are called.
DynCollect
The DynCollect
is roughly equivalent to Arc<dyn Collect>
. However, we would
like to support storing a collect instance in a callsite using an atomic
operation and avoid locking. This will help cases like rustc
where the initial
registration process carries significant total cost as callsites are often used
only a handful of times. An Arc<dyn Collect>
consists of two pointers: one to
the data and one to the v-table. This means we cannot use an AtomicPtr
.
To enable atomic operations for storing Collect
instances, a different virtual
dispatch strategy is used. The VTable is embedded as a header to the data. The
DynCollect
type looks like:
struct DynCollect {
ptr: *const DynCollectHeader,
}
struct DynCollectHeader {
ref_count: AtomicUsize,
vtable: &'static DynCollectVtable,
}
This is similar to the struct layout and strategy used by Tokio tasks. More details can be found in that code
The downside of this strategy is there is no zero-cost conversion between
Arc<dyn Collect>
and DynCollect
. That said, most tracing
users use the
Subscribe
API and not the tracing-core::Collect
API, so end user impact should
be minimal.
Locking
Currently, the callsite registration process is synchronized using a lock
similar to OnceCell
. We could theoretically remove the lock if we are OK w/
optimistic registration. Concurrent first calls to a callsite could both
call register_callsite()
and only one would win the compare-and-swap. This may
be confusing the user and benefits are not clear.
Subscribe API
The Subscribe
trait in tracing-subscriber
should also have a callsite
specific trait. The split would be similar to RegisterCallsite
and Collect
,
except that the register_callsite
method probably could just return regular
trait object like Arc<dyn Subscribe>
or Box<dyn Subscribe>
.
Open questions
All the naming is TBD. Also, more details might come to light during implementation. The proposal is mostly a high level sketch.
Am I correct in understanding that this API doesn't prevent the usage of tracing-core
in #[no_std]
+ no alloc
environments? It seems like the construction of DynCollect
only has dependencies on core
and nothing on alloc
.
it does not prevent no_std cases w/o alloc.
I'm definitely interested in this idea, as it seems like it has the potential to improve performance significantly when a collector wishes to implement specific behaviors only for a small subset of callsites.
I'm also open to investigating moving the thread local collector features out of tracing-core
and into a higher-level crate. Handling the thread-local collectors introduces a lot of complexity that could otherwise be avoided in tracing-core
. If we move that functionality out of core, we will make sure that enough APIs are exposed that it's possible to re-implement it as an opt-in feature in a higher-level crate, but I don't think this is going to be a significant issue.
My biggest concern with the per-callsite collector system is that it introduces a lot more complexity that has to be communicated to users; the tracing
APIs already feel very complex, so if we introduce more moving parts, we should be careful to make sure people can figure out how to use them. Ideally, we would hope that a simple Subscribe
implementation, that doesn't want to have specific behavior for particular callsites, can be implemented without having to know about a per-callsite API, and that the added complexity is only exposed to the user when they want to implement callsite-specific behavior. I'm not sure if this is actually possible, but it would be ideal.
It definitely seems like the next step is to actually try a prototype implementation, to make sure there are no obvious limitations we haven't thought of here!
I'm also open to investigating moving the thread local collector features out of tracing-core and into a higher-level crate.
For this, does the below sound like a viable gameplan:
- Remove from
tracing-core
:-
dispatch::get_default
-
dispatch::set_default
-
- Introduce to
tracing
:-
collect::PerThreadCollector
, a globalCollect
or that delegates to thread-localCollect
s
-
- Change in
tracing
:-
collect::set_default
only has effect ifPerThreadCollector
is global collector -
collect::with_default
only has effect ifPerThreadCollector
is global collector
-