tracing icon indicating copy to clipboard operation
tracing copied to clipboard

Dynamically resolved metadata (normalized_metadata) in core

Open CAD97 opened this issue 2 years ago • 2 comments

(With humble apologies to the CODEOWNERS getting pinged for mechanical changes...)

Best reviewed commit-by-commit. I've gone to lengths to ensure that each commit is a semantically meaninful chunk on its own and builds/passes all tests.

This is a "what if" mainly to scratch my own itch. Feel free to say "we don't want this", as it is not insignificant chunk of additional complexity, and broadly breaking changes to core APIs. However, I do think this is generally beneficial, thus the implementation and PR.

PR #2043 offers a lesser solution which works for [email protected]. This is generally the "principled" version of that hack.

Motivation

Consider the case where I'm binding to an external library that provides C FFI. (For convenience, we'll call it lib.) Because they're trying to provide a nicely usable library, lib performs some diagnostic logging.

By default, lib logs directly to stderr. However, because lib wants to integrate well into your application, they provide a callback API for you to handle lib's logs. The interface looks something like this:

/// Configure delivery of debug messages.
extern "C" Lib_Configure_Debug(
    flags: Lib_Debug_Flags,
    callback: Option<extern "C" fn(
        /// The flags (e.g. severity) for this debug message.
        flags: Lib_Debug_Flags,
        /// The file name that this debug message is from.
        file: *const c_char,
        /// The line number that this debug message is from.
        line: c_int,
        /// The semantic location that this debug message is from.
        func: *const c_char,
        /// The actual debug message to be e.g. logged.
        message: *const c_char,
    ) -> Lib_Result>;
) -> Lib_Result;

We can then hook up lib to tracing like this:

let enable_flags = determine_enabled();
ffi::Lib_Configure_Debug(enable_flags, lib_debug_callback)?;

extern "C" fn lib_debug_callback(
    flags: ffi::Lib_Debug_Flags,
    file: *const ffi::c_char,
    line: ffi::c_int,
    func: *const ffi::c_char,
    message: *const ffi::c_char,
) -> ffi::Lib_Result {
    let level = debug_flags_to_level(flags)?;
    let file: Option<&str> = cstr(file)?;
    let line: u32 = line.try_into()?;
    let func: Option<&str> = cstr(func)?;
    let message: Option<&str> = cstr(message)?;
    
    tracing::event!(
        level: level,
        // wait... how do we specify file/line?
        message
    );
    
    ffi::Lib_Result_Ok
}

In current tracing, it's impossible to hook this debug callback into tracing properly, because Metadata is 'static, and the metadata is where file/line/module_path live.

I want it to be possible to do so. And hey, why doesn't tracing-log have this same problem?..

Solution

tracing-log solves the issue by conspriring with approximately the entire tracing ecosystem. Specifically, tracing-log contains the NormalizeEvent trait, whose purpose is to take whatever static metadata the event thinks it has and provide the real, normalized metadata instead.

Event normalization is done in tracing-log by identifying that an event came from a tracing-log callsite, and then using the fields of that event to smuggle in the dynamic metadata that log provides.

That's where the conspriring comes into play. Either code knows about tracing-log, uses NormalizeEvent and filters out event fields log.target, log.module_path, log.file, and log.line for is_log() events[^1], or they treat events emitted from log's macros incorrectly; potentially making incorrect choices on the metadata, and logging the metadata as semantic log fields.

[^1]: And here's an interesting note: currently, they do so by just always filtering out fields starting with "log." This means potential data loss for events which don't come from the log crate; a tracing::error!(log.data, "oh no"); will not record the log.data field for most subscribers. Even more annoying, consider putting log.file; it's not enough to filter out just the magic fields, they have to be filtered only when NormalizeEvent::is_log().

This PR takes much the same approach, but in doing so directly in tracing-core, can do so mostly transparently to most consumers. However, we additionally support dynamic metadata holding any number of custom fields, so checking for a known callsite is no longer sufficient.

To identify a piece of dynamic metadata, we use a magic name. The name is constructed of two randomly chosen noncharacters, so should be sufficiently unique to not occur accidentally. Each metadata field has a magic name prefixed with the same magic, and are used to normalize the metadata as in tracing-log.

To hide the magic fields when not requested, we require them to be a strict (and ordered) prefix[^2] of the metadata's FieldSet. As such, we can slice off the prefix and only leave the nonmagic fields in the set. For events' ValueSet, we require by construction[^3] that events with dynamic metadata use the normalized FieldSet. For spans' attributes... a solution has yet to be designed[^3], but will likely take the same approach.

[^2]: When writing this, I realized that having them as a postfix may be simpler, as FieldSet::contains(Field) can continue to be a simple bounds check rather than having to account for an offset slice start.

[^3]: While there is a macro for creating the dynamic metadata itself, there is no public API for constructing/dispatching the event or span with said metadata. tracing-log continues to use its bespoke dispatch code.

Each commit is a semantic step towards this goal, and compiles and passes tests individually. For more information, see each commit, which is has additional context attached as well. To accomplish dynamic metadata, we:

  • Add a lifetime to field::Visit, which communicates the lifetime of any visited string. Existing visitors can do impl Visit<'_> for the existing semantics.
  • Use the lifetime on field::Visit to remove the unsafety in tracing-log. The lifetime will allow doing so generically in the uplifted solution for spans as well as events.
  • Uplift the RecordFields trait into tracing-core.
  • Remove 'static from Metadata::name, so it can be injected as well, to replace the magic name for spans. (The metadata name is typically unused for events, but used for spans. Spans using dynamic metadata are expected to be less common, but will certainly want to inject a name for post-normalization.)
  • Make places that don't need &'static Metadata<'static> use a shorter lifetime, so they can work with dynamic metadata.
  • The big meat: implement the actual support for dynamic metadata. The mere act of providing the patch data is simple enough, but presenting a normalized view to the outside world is where things get interesting. Event and tracing-subscriber's SpanRef should present consistent post-normalization views unless prenormal data is requested.
  • Payoff: use the system we've implemented to implement tracing-log.

Remaining work

  • Tests, tests, tests. tracing-log acts as a simple proof that the system works, but we need more tests ensuring that magic fields don't leak into the public API except where explicitly requested.
  • APIs to actually emit spans/events with dynamic metadata.
  • Did I mention more tests?

CAD97 avatar Apr 04 '22 18:04 CAD97

Technically closes #372, though supplying a dynamic level still isn't advised per the added documentation.

CAD97 avatar Apr 28 '22 13:04 CAD97

(Sorry for flooding the actions queue with the rebase, but for this large PR I feel showing/proving each commit independently passes CI is important)

CAD97 avatar Jun 14 '22 19:06 CAD97

@CAD97 thanks for this change, we were going on this exact same road before knowing that this PR existed. Is it something that you're willing to drive to the end?

gferon avatar Feb 02 '23 13:02 gferon

@gferon I'm still excited to see something along these lines happen, and I've rebased the PR. Unfortunately, I don't have much time I can spend on refining this at this exact moment. I did my best to make sure each commit is a single unit and fairly easily understood in isolation; if y'all want to take this and go from here, or take it as a draft and redo it, I'm fine either way.

I might end up with some time to spend, in which case I'd love to spend that driving this change, but I can't promise anything at the moment; it's my final semester at $gradschool currently and that takes priority 🙂

CAD97 avatar Feb 05 '23 00:02 CAD97

just a note, this would make #2457 way cleaner to implement and would allow it to be an independent crate :)

emersonford avatar Feb 07 '23 19:02 emersonford

just a note, this would make #2457 way cleaner to implement and would allow it to be an independent crate :)

This is exactly the same I'm trying to pursue with this, but with C++ code using spdlog and this PR would help a lot indeed.

gferon avatar Feb 07 '23 19:02 gferon