tracing
tracing copied to clipboard
subscriber: rewrite Registry to use static arrays
Feature Request
Crates
- tracing-subscriber
Motivation
The typemap-based used by the Registry's to store extensions is a typemap that erases types, places them into a Box, and stores the box into a HashMap. While this makes it easy to get started with the Registry, this approach does have a noticeable overhead when used in contexts such as profiling (e.g., https://github.com/bevyengine/bevy/issues/4892).
Proposal
Instead of using a HashMap for extensions, Layer types would be required to pre-declare extension types they'd be storing in the Registry and use arrays to store extensions, avoiding the boxing (and potentially, the HashMap lookup).
Alternatives
Don't do this, but I think that the performance benefits of this are non-trivial.
(Sorry this issue is a bit short, I had to run and most of the implementation details will be discovered while creating this API.)
I think it should be possible to use sharded_slab::Pool to store the extensions as well as the always-present data, and to do so in a backwards-compatible manner.
The backwards-compatible scheme uses a global RwLock that's only write-locked when a new extension type is registered. This will result in higher initial contention as layers first access their extensions, but should easily amortize out, and an optional API to pre-register types can remove the need for write locks.
The exact same API works to require pre-registration; panic if the extension type is not registered, as this is clear programmer error.
Super roughly, the idea is to invert from AoS Pool<AnyMap![_]> to SoA AnyMap![Pool<_>]:
Registry:
spans: Pool<SpanDataInner>
extensions: RwLock<AnyMap![Pool<Option<_>>]>
..
SpanData:
data: PoolRef<SpanDataInner>
extensions: &RwLock<AnyMap![Pool<Option<_>>]>
SpanDataInner:
extensions: SmallMap<TypeId, usize>,
..
SpanData.insert(self, val: T):
ext = None
exts = self.extensions.read()
if exts.get(T) is None:
exts = self.extensions.write()
exts.insert(T, RwLock(Pool()))
exts = self.extensions.read()
ext = exts.get(T)
ix = ext.insert(Some(T))
self.data.extensions.insert(T, ix)
SpanData.get(self, T):
exts = self.extensions.read()
ext = exts.get(T)?
ix = self.data.extensions.get(T)?
ext.get(ix).as_ref()
I think it should be possible to use
sharded_slab::Poolto store the extensions as well as the always-present data, and to do so in a backwards-compatible manner.
Yes, a scheme like what you're describing here has been on my TODO list for a while. I think I have a partially-finished branch making a change like this somewhere.
@CAD97, if you're interested in working on that, I would love a PR, and I can try to find what I wrote a while ago as a starting point, if you'd like it?
I'd be happy to take a stab at it if you can find your old diff (and maybe even if you can't)
Oh no, the problem with this approach is that ExtensionsMut needs to have and gives out &mut access to all of the extensions on that span, so it needs to hold the lock on each one... this is theoretically solvable, but it reduces the gain a significant amount, as ExtensionsMut needs to hold access to AnyMap![T => RefMut<T>] and isn't sufficient with just &mut AnyMap![T => usize].
Also, we don't have a Pool::get_mut; is that just an oversight or a limitation? Similarly, a remove (blocking version of clear) is necessary for this impl scheme.
If it's possible to store either of
- type-erased
sharded_slab::pool::RefMut<Opaque> - dyn-typed
sharded_slab::pool::RefMut<dyn Any>
Then that storage to do the locking can be put into DataInner and borrowed by ExtensionsMut. Otherwise, ExtensionsMut would have to hold the anymap itself.
Alternatively, we could prove that extending &mut T beyond the lock is okay if ExtensionsMut is around since that theoretically does lock access?
Also, I don't think it's possible for outside crates to implement SpanData;
feature = "std"isn't purely additive, since it adds required methods toSpanDataSpanData::extensionsreturns a type which is not externally constructible- (Actually,
SpanDatais implementable by outside crates if thestdfeature isn't set)
And this means the only useful implementation of LookupSpan (i.e. one that doesn't just always fail lookup) is one that wraps another LookupSpan.
So this is where I got in a quick test; the Data portion of it works, but the interface of Extensions[Mut] causes problems
https://github.com/tokio-rs/tracing/compare/master...CAD97:tracing:pool-exts
Concept for a smaller incremental improvement(?):
Currently (IIUC), the AnyMap hashmap is reused, but the actual extensions' boxes aren't. A small improvement then is instead of AnyMap![T => T], store AnyMap![T => Option<T>] to reuse the extensions' boxes.
I'm going to see if this approach works sometime this week as well. Perhaps these boxes can even be free-list-pooled and moved into the extensions map (rather than coming as part of the span data), even if pooling the extensions directly didn't work out.
I wouldn't rule out a breaking change, @CAD97—if the performance improvement is as good as I think that it might be, then I think we should seriously considering release a tracing-subscriber 0.4 (alongside some other changes that have been queued up).
I definitely want to get an 0.4 out at some point, so if we can come up with a really solid implementation of this that requires a breaking change, I would happily cut an 0.4 release for it.
I think there's a few other items that merit a breaking release: https://github.com/tokio-rs/tracing/milestone/11
~~breaking changes? #1612? #2048!?~~
I'll revive the branch and see what I can do to chase the lifetime errors out then 🙂
Also, https://twitter.com/mycoliza/status/1530356065610567686?s=20&t=q2_dwEk3_1GnWorF1F4pBg wants to get into a tracing-subscriber bump (in tracing-filter as https://github.com/CAD97/tracing-filter/pull/13) to have EnvFilter utilize LookupSpan.
Oh, sorry! I meant that the breakage would be in tracing-subscriber, which can maybe be used as a dry-run-of-sorts for cross-version compatibility of Layers before employing that in tracing 0.2.
For context, the Bevy reports that they incur about 10μs per each span/event when profiling.