tracing
tracing copied to clipboard
subscriber: rewrite Registry to use static arrays
Feature Request
Crates
- tracing-subscriber
Motivation
The typemap-based used by the Registry's to store extensions is a typemap that erases types, places them into a Box, and stores the box into a HashMap
. While this makes it easy to get started with the Registry
, this approach does have a noticeable overhead when used in contexts such as profiling (e.g., https://github.com/bevyengine/bevy/issues/4892).
Proposal
Instead of using a HashMap
for extensions, Layer
types would be required to pre-declare extension types they'd be storing in the Registry and use arrays to store extensions, avoiding the boxing (and potentially, the HashMap lookup).
Alternatives
Don't do this, but I think that the performance benefits of this are non-trivial.
(Sorry this issue is a bit short, I had to run and most of the implementation details will be discovered while creating this API.)
I think it should be possible to use sharded_slab::Pool
to store the extensions as well as the always-present data, and to do so in a backwards-compatible manner.
The backwards-compatible scheme uses a global RwLock
that's only write-locked when a new extension type is registered. This will result in higher initial contention as layers first access their extensions, but should easily amortize out, and an optional API to pre-register types can remove the need for write locks.
The exact same API works to require pre-registration; panic if the extension type is not registered, as this is clear programmer error.
Super roughly, the idea is to invert from AoS Pool<AnyMap![_]>
to SoA AnyMap![Pool<_>]
:
Registry:
spans: Pool<SpanDataInner>
extensions: RwLock<AnyMap![Pool<Option<_>>]>
..
SpanData:
data: PoolRef<SpanDataInner>
extensions: &RwLock<AnyMap![Pool<Option<_>>]>
SpanDataInner:
extensions: SmallMap<TypeId, usize>,
..
SpanData.insert(self, val: T):
ext = None
exts = self.extensions.read()
if exts.get(T) is None:
exts = self.extensions.write()
exts.insert(T, RwLock(Pool()))
exts = self.extensions.read()
ext = exts.get(T)
ix = ext.insert(Some(T))
self.data.extensions.insert(T, ix)
SpanData.get(self, T):
exts = self.extensions.read()
ext = exts.get(T)?
ix = self.data.extensions.get(T)?
ext.get(ix).as_ref()
I think it should be possible to use
sharded_slab::Pool
to store the extensions as well as the always-present data, and to do so in a backwards-compatible manner.
Yes, a scheme like what you're describing here has been on my TODO list for a while. I think I have a partially-finished branch making a change like this somewhere.
@CAD97, if you're interested in working on that, I would love a PR, and I can try to find what I wrote a while ago as a starting point, if you'd like it?
I'd be happy to take a stab at it if you can find your old diff (and maybe even if you can't)
Oh no, the problem with this approach is that ExtensionsMut
needs to have and gives out &mut
access to all of the extensions on that span, so it needs to hold the lock on each one... this is theoretically solvable, but it reduces the gain a significant amount, as ExtensionsMut
needs to hold access to AnyMap![T => RefMut<T>]
and isn't sufficient with just &mut AnyMap![T => usize]
.
Also, we don't have a Pool::get_mut
; is that just an oversight or a limitation? Similarly, a remove
(blocking version of clear
) is necessary for this impl scheme.
If it's possible to store either of
- type-erased
sharded_slab::pool::RefMut<Opaque>
- dyn-typed
sharded_slab::pool::RefMut<dyn Any>
Then that storage to do the locking can be put into DataInner
and borrowed by ExtensionsMut
. Otherwise, ExtensionsMut
would have to hold the anymap itself.
Alternatively, we could prove that extending &mut T
beyond the lock is okay if ExtensionsMut
is around since that theoretically does lock access?
Also, I don't think it's possible for outside crates to implement SpanData
;
-
feature = "std"
isn't purely additive, since it adds required methods toSpanData
-
SpanData::extensions
returns a type which is not externally constructible - (Actually,
SpanData
is implementable by outside crates if thestd
feature isn't set)
And this means the only useful implementation of LookupSpan
(i.e. one that doesn't just always fail lookup) is one that wraps another LookupSpan
.
So this is where I got in a quick test; the Data
portion of it works, but the interface of Extensions[Mut]
causes problems
https://github.com/tokio-rs/tracing/compare/master...CAD97:tracing:pool-exts
Concept for a smaller incremental improvement(?):
Currently (IIUC), the AnyMap
hashmap is reused, but the actual extensions' boxes aren't. A small improvement then is instead of AnyMap![T => T]
, store AnyMap![T => Option<T>]
to reuse the extensions' boxes.
I'm going to see if this approach works sometime this week as well. Perhaps these boxes can even be free-list-pooled and moved into the extensions map (rather than coming as part of the span data), even if pooling the extensions directly didn't work out.
I wouldn't rule out a breaking change, @CAD97—if the performance improvement is as good as I think that it might be, then I think we should seriously considering release a tracing-subscriber 0.4 (alongside some other changes that have been queued up).
I definitely want to get an 0.4 out at some point, so if we can come up with a really solid implementation of this that requires a breaking change, I would happily cut an 0.4 release for it.
I think there's a few other items that merit a breaking release: https://github.com/tokio-rs/tracing/milestone/11
~~breaking changes? #1612? #2048!?~~
I'll revive the branch and see what I can do to chase the lifetime errors out then 🙂
Also, https://twitter.com/mycoliza/status/1530356065610567686?s=20&t=q2_dwEk3_1GnWorF1F4pBg wants to get into a tracing-subscriber bump (in tracing-filter as https://github.com/CAD97/tracing-filter/pull/13) to have EnvFilter
utilize LookupSpan
.
Oh, sorry! I meant that the breakage would be in tracing-subscriber
, which can maybe be used as a dry-run-of-sorts for cross-version compatibility of Layers before employing that in tracing 0.2.
For context, the Bevy reports that they incur about 10μs per each span/event when profiling.