log icon indicating copy to clipboard operation
log copied to clipboard

Structured Logging

Open alexcrichton opened this issue 7 years ago • 61 comments

One of the major points of discussion during the log evaluation was structured logging, specifically the log crate's relation to slog. In the review we ended up concluding that there's no reason to not add structured logging to the log crate and it'd be awesome to unify the slog/log ecosystems!

This issue is intended to serve as a tracking issue for concerns and ideas related to this. For example the specific APIs the log crate would have, how this would relate to slog, migration stories, etc.

alexcrichton avatar May 19 '17 20:05 alexcrichton

Really? Awesome. Is that planned to be added before log 1.0.0 release?

dpc avatar May 19 '17 20:05 dpc

The thought I had in the meeting was something like this:

info!("..."; MeaningfulType(information), PossiblyAnother(something_else));

where the currently selected logging implementation receives anything that matches its associated type, and any other types do nothing.

Key-value style structured logging in this paradigm may look something like this, where the kv macro is provided by the structured logging library to construct the appropriate type:

info!("..."; kv!("a" => a, "b" => b));

EDIT: or obviously it could provide its own macros that replace the ones in log:

info!("..."; "a" => a, "b" => b);

dtolnay avatar May 19 '17 20:05 dtolnay

@dpc that's the current intention, yeah, preparing this for the 1.0 release. We'd probably do it in 0.3 if we could, but it's unlikely to be possible without breaking changes! We're of course very eager to hear about everything you've done with slog and the best way to make progress on this. Want to make sure you're kept very closely in the loop!

alexcrichton avatar May 19 '17 20:05 alexcrichton

I will be happy to share my perspective and experiences during the discussion. Personally, I would be very happy to see as many features from slog ported to log so slog could just reuse log, or even be rendered completely unnecessary. I've also pinged slog users in case they are interested - they probably know better than me what do they look for in structured logging.

The whole feature is implemented in slog, and of course, I tried to make it as good as I could have imagined, so my initial point of view is: just take a look at https://docs.rs/slog/2.0.5/slog/trait.KV.html and https://docs.rs/slog/2.0.5/slog/trait.Serializer.html , and potentially other slog pieces, and use it as a starting point for discussion. The KV + Serializer are loosely mimicking serde, just simplified (eg. no sequences support).

Edit: Also, reading the documentation of https://docs.rs/slog/2.0.5/slog/macro.log.html , might give a good overview of features supported. Eg. it's possibe to use just someval in place of "k" => val if someval implements KV so that somevar can supply it's own key(s) (useful for logging structs)); or "k" => %v if v implements Display etc.

dpc avatar May 19 '17 21:05 dpc

When we do add structured logging, there's one question I'm wondering about: what do we encourage users to use?

Binary crates choosing either to go full-formatting (like current log) or full-data is not a problem, but if we continue to support both, what should libraries choose?

I mean having a library output debug information in structured logging will give the end reader more flexibility, but I'm guessing it would also add a bit more overhead as well. If all a binary crate wants is a console stream of readable log messages, is it best to construct the additional structured log records and then just use a default conversion to a string?

I'm also curious what the best practices for this in slog currently are, if that would be reasonable to expand on? I mean, so far my actual usage of log has been a leveled println!() replacement with the extra bonus that I get debug messages from my dependencies as well. Not that this has to be answered here - I should probably just go read the documentation - but it might be useful context for adding an implementation to the log crate too!

daboross avatar May 20 '17 07:05 daboross

Good question. I'm aware that not everyone buys into the whole structured logging concept. Especially in slog where more "unusual" features are bundled together, people don't initially see the whole value proposition. Most people are used to "there is one global logger and I give lines of texts" logging and ask "why all this hassle?". I myself became "a believer" once I was working on a heavy-duty production service written in Go, where all the data was logged to json, collected to elastic search cluster where business analytics and monitoring was performed on it.

It would be useful to check experiences of any other programming language community. I'm not aware of any languages that have first-class, core-encouraged, structured logging support, but maybe there are some.

IMO, if structural logging is to be supported, then it should be encouraged, since as far as I can see it is clearly better. Even if you just log a stream on a console, doesn't screenshot on https://github.com/slog-rs/slog look better than a plain wall of text? Without structured logging, you can only distinguish between basics: logging level, timestamp and message, etc. With structured logging, there is much more flexibility. And libraries can't decide for all applications using them, what kind of logging should they need so they should default to structured, as it can cheaply and gracefully be downgraded to plain line of text.

The performance different is negligible. In slog all structured logging key-value are just one additional pointer to KV object data. If you do info!("received number: {}", n) then there will be one "formatting to string" done for you. If you do info!("received"; "number" => n); and then write it to stream, there will also be one pass of "formatting to string" done for you. As far as I can tell, any logging performance considerations should be mostly: avoid copying and allocation, not doing anything until needed, preparing as much as possible during compilation (static etc.) and asynchronous logging (making sure passing logging data to another thread is cheap).

dpc avatar May 20 '17 20:05 dpc

Thanks for going over that!

I think initially I was a bit worried that adding complexity to log would scare new library authors away from adding logging into their libraries, but I think it won't be too bad.

The console logging does look nice! I mean, I would think it'd be nicer with data "inline" with the message, but then that would sacrifice all the rest of the flexibility.

To me,

weak encryption algorithm 'xor' requested by `82.9.9.9:42381` (host: localhost:8080, version 0.5)`

looks better than

weak encryption requested, algo: xor, port: 42381, peer_addr: 82.9.9.9, port: 8080, host: localhost, version: 0.5

... but I imagine having ecosystem-wide structured logging would be excellent for larger applications. The coloring available when data is separated like that isn't too bad either!

I think my ideal logging scheme would be something like info!("received number: {number}", number => n), with data both separated out into query-able structured records, and placed inline with the formatting - but that'd probably be a lot of work and complexity in teaching for little overall gain.

In any case, thanks for that!

daboross avatar May 22 '17 06:05 daboross

@dtolnay: I propose a hopefully leaner/less magical implementation than making KV part of a macro as now in the slog crate. What do you think about the logging function taking a single value-argument of a Serialize type? The deeper structure or representation of a record is then left to the programmer. It would be quite easy to declare a wrapper type for emitting the logging record as some sort of KV-dictionary. The advantage is that any logging record has a meaningful, possibly varying type. And of course the less magic thing, new developers just starting to use logging would not have to learn the intricacies of another set of macros.

sanmai-NL avatar May 24 '17 11:05 sanmai-NL

I just wanted to highlight one of the features I like best about slog which is simply the ability to add structure to the resulting log message as different parts of the application get involved.

For example I have a Hyper Service. Each Request is given a unique logger instance early on in the call function (a simple clone of a root logger, which is cheap in slog).

Eventually the application accesses the logger and logs something about the Request it has been working on, such as:

// .. current logger is acquired
info!(logger, "stored widget")

What is interesting however is that prior to this the logger has been extended by some other code (that I may or may not know about):

// .. current logger is acquired
let logger = logger.new(o!("request_id" => "123abc"));
// .. updated logger is made available for others as current logger

The output (assuming JSON formatter) is then roughly:

{"ts":"2017-06-19T02:25:08.521265800Z","level":"INFO","msg":"stored widget","request_id":"123abc"}

If I also added some external code to do authentication it might end up as:

{"ts":"2017-06-19T02:25:08.521265800Z","level":"INFO","msg":"stored widget","request_id":"123abc","user_id":100}

I understand that passing a logger instance all over the place may seem clunky to some folks but if we can figure a way to maintain this functionality there are a lot of nice benefits that can be gained.

bradleybeddoes avatar Jun 19 '17 04:06 bradleybeddoes

@bradleybeddoes I believe the part you're describing is called "contextual logging". I'm not sure if we should open another issue or consider it a part of this issue.

dpc avatar Jun 19 '17 06:06 dpc

As long as it's possible to construct a Record via the public API (which we're going to support), a contextual logging library can be built on top of log without too much trouble I don't think.

sfackler avatar Jun 19 '17 18:06 sfackler

I probably don't understand what does "structured logging" mean but seems like people assume that logging automatically means terminal or streaming to a file and then reading that file, so you need it to look nice for that use-case.

My personal interest(I guess I'm selfish) is to keep the cost of the logger as low as possible because in my case(a lot of services on an embedded device, all sending logs over the network), people send thousands of logs per second(actually, this is the amount of logs coming from the device, not from every application separately) and I can't afford adding overhead. Here is the project I'm thinking about here(a bit inactive, sry...): https://github.com/lilianmoraru/dlt-rs

More info about DLT, here: https://at.projects.genivi.org/wiki/display/PROJ/About+DLT-Transport

lilianmoraru avatar Jul 08 '17 23:07 lilianmoraru

It would be useful to check experiences of any other programming language community. I'm not aware of any languages that have first-class, core-encouraged, structured logging support, but maybe there are some.

In .NET we have serilog which uses a standard message templating syntax for structured logging.

@nblumhardt and I spent some time a while ago exploring a similar design for Rust. The idea is to log both a textual log message template along with the individual properties and additional context that make up a message. It's up to the particular sink to decide how it consumes the template and the properties, maybe by substituting properties in, composing a new object with a property containing the message, whatever. Libraries don't care where their logs go.

I'll admit I haven't spent a lot of time digging into the state of structured logging in Rust or the API log exposes besides it's macros for a long time so will go and catch up.

KodrAus avatar Jul 11 '17 21:07 KodrAus

One of the things we explored with emit was keeping very close to the API of the log crate - just adding a little bit of structured-logging-sauce on top :-)

I.e. here's a typical log event:

info!("Hello, {}!", env::var("USERNAME").unwrap());

And here's the structured extension along the lines of emit:

info!("Hello, {}!", user: env::var("USERNAME").unwrap());

The two events would produce exactly the same textual output, but by naming the user parameter, this would also be available as structured data attached to the event, if the log destination supported it.

Keeping a smooth continuum from completely unstructured, to fully-structured, log events might be preferable to splitting the API into two unrelated subsets, if it could be achieved in a non-breaking way :-)

nblumhardt avatar Jul 11 '17 21:07 nblumhardt

How is the performance of such structured logging proposals?

"Message Templates" seems like it will involve string parsing/manipulations. Eg. It looks to me that emit is dynamically allocating HashMap every time it logs anything: https://github.com/emit-rs/emit/blob/master/src/lib.rs#L144 . Might be OK for languages that are generally not very performant and overallocate all over the place, but probably in Rust niche, it can be a problem.

The way structured logging is done in slog is very performant, with ability to very quickly offload any formatting and output to different thread(s), with minimal work done by the current thread.

dpc avatar Jul 12 '17 00:07 dpc

@dpc You're absolutely right that's an important point. And we expect a core Rust crate to cater to those use cases where performance is paramount. But how you capture properties is more of an implementation detail, whether it's a hash map or some kind of linked list or something else. I think we both agree that the latency of logging from a caller's perspective should be absolutely minimal, and the reality in either case is that a sink has the freedom to consume the structured log however it needs for its eventual destination.

The main thing I'd like to maintain is not having to pass a logger into the log calls, or have to carry a logger around in libraries for the sake of structured logging. I think it's still important to support that case, but in my experience it's been most helpful to grab the ambient logger and enrich it with some additional state as necessary rather than chain a logger throughout your application.

KodrAus avatar Jul 12 '17 01:07 KodrAus

I think it would be awesome if we could get something like emit's syntax working for this, glad you mentioned that nblumhardt.

Perhaps with a proc-macro-hack, or maybe even without, could the log!() macro be made to store a temporary for each argument, then put the arg in both a stack-allocated array of &[(&'static str, &Display)] and pass it onto format_args!()?

A vec or hashmap could then be allocated by the logging backend if it needed to send the arguments to another thread, or any other permanent storage.

daboross avatar Jul 12 '17 01:07 daboross

It's pretty straightforward to turn something like {foo: some_variable, bar: "some_literal"} into &[("foo", &some_variable), ("bar", &"some_literal")] in a normal macro-rules macro.

I would very much like to keep the "normal" logging path allocation free, and that shouldn't be too much harder to make happen with some structured logging.

sfackler avatar Jul 12 '17 02:07 sfackler

chain a logger throughout your application

This is a part of "contextual logging", and is orthogonal to structured logging provided by slog. One can use slog-scope and not worry about Logger object if so inclined.

I was mostly wondering about:

log("User {username} logged in from {ip_address}", username, ipAddress)

from https://messagetemplates.org/. It is a nice idea, but I wonder how hard it will be to generate static information at compile time, using stable-Rust macros.

&[("foo", &some_variable), ("bar", &"some_literal")]

What will be the type of it? I was doing something like that in slog v1: &[&'static str, &Value] where Value was a trait. The trait was so that users can implement it for other types. But then I switched in slog v2 to just &KeyValuePairs where KeyValuePairs is a trait, since it had many benefits: it is possible to implement KeyValuePairs for any type (arrays, tuple, heterogeneous list, structs, etc.). Key can be generated at runtime, supporting lazy values etc. The common case is of course impl KeyValuePairs for [(&'static str, T)]. Since KeyValuePairs provides potentially multiple key-value pairs, it can control order of pairs, or impl KeyValuePairs for combination of contexts etc.

Now that I looked at https://messagetemplates.org/, I kind of wish I did it as so in slog. Though I still don't know how to write a macro for it.

dpc avatar Jul 12 '17 04:07 dpc

@dpc

For basic logging, this is already somewhat supported with format_args!(). Any custom macro would just have to send arguments to that correctly, and then also be able to use them to create an array.

This works in today's log:

info!("a {whatisit} message! {ending}", whatisit="keyword", ending=":)");

http://play.integer32.com/?gist=09caba59aeed61eb13ec7c3261b7c97c&version=stable

daboross avatar Jul 12 '17 04:07 daboross

Hmmm... I liked the fact that one does not have to repeat itself in log("User {username} logged in from {ip_address}", username, ipAddress) but maybe it is even better, since it's not position dependent.

dpc avatar Jul 12 '17 04:07 dpc

Also, how to support "additional" keyvalues? Eg.

info!("a {whatisit} message! {ending}", whatisit="keyword", ending=":)", request_id=req_Id);

The request_id is assigned to the key-value pairs, but is not actually used in textual message itself. Sometimes the number of additional data might be quite big, and not everything is neccessary in the text information - some might be mostly useful for robotos.

Somehow the macro would have to make distinction between the values used in the text message, and addition ones, as format_args!() complains in any argument was not used (rightly so). Could be eg. additional separator.

info!("a {whatisit} message! {ending}", whatisit="keyword", ending=":)"; request_id=req_Id);

Also, what are the preferences for the delimiter between key and value? Slog uses => but it is kind of heavy. Would:

info!("a {whatisit} message! {ending}", whatisit: "keyword", ending: ":)"; request_id=req_Id);

be fine with everyone? Or are there any alternative preferences/ideas?

dpc avatar Jul 12 '17 04:07 dpc

As far as I can tell, neither emit or slog seem to support arbitrary types as structured values (see emit's Value enum, slog's Serializer trait). I found myself wanting to log e.g. a UserContext struct but could not do so. I'd be very happy to see support for any value with a serde::Serializable implementation, or some similar trait. Of course as a workaround you can convert any struct to JSON and store that.

Regarding the logging macro syntax bikeshedding, I quite like keeping format_args! as a subset, so using = between key and value for string formatting. I slightly prefer ; over , between string formatting and structured arguments. I quite like : between key and value for the structured arguments, to match the struct literal syntax, but => and = are also OK.

To summarise, I like:

info!("{} {msg}", "Hello", msg="world!"; k: v);

fluffysquirrels avatar Jul 12 '17 17:07 fluffysquirrels

A lot of people asked for structured types in slog. Sadly it has many challenges. I didn't feel up to the task and just left it out. I think it's a complex subject, and maybe we could use a different github issue so we keep this one focused on the core structured logging.

dpc avatar Jul 12 '17 17:07 dpc

At first I didn't really like the syntax of separating replaceable properties from other properties, because I see the message as just a view over the properties. But thinking about it, rejecting logs at compile time with messages that aren't properly replaceable sounds useful.

KodrAus avatar Jul 12 '17 22:07 KodrAus

What's the status here?

Anything that can be done to move this forward?

theduke avatar Oct 28 '17 15:10 theduke

The next step would be to design the API. In particular, we need to figure out what the interface to the structured data is. We want the logging codepath to be allocation-free, so it'll be some object-safe trait. It might end up looking a bit like rustc-serialize's Encodable? We also need to figure out what the macro syntax would look like.

sfackler avatar Nov 07 '17 23:11 sfackler

slog also supports adding environment to the current logger, even if the logging statement is unaware of it. This way, you can e.g. log stuff from your packet parsing routine, but have a source IP address annotated because the surrounding code added it to the environment. https://github.com/slog-rs/slog/issues/124 is an issue about slog seemingly needing allocations to do that (I lost track of that issue), but it would be nice if the log crate could do the same without any allocation overhead.

fn decode(data: &[u8]) -> Result<Packet, Error> {
    if data[1] != 0 {
        warn!("padding not zeroed");
    }
    unimplemented!();
}

fn on_receive(from: IpAddr, data: &[u8]) {
    with_kv!(from: from, {
        let packet = decode(data);
        // [...]
    });
}

This way you can add annotation to log statements that aren't even in your library. (I don't remember the exact syntax slog used, this example was just made up.)

tbu- avatar Dec 08 '17 02:12 tbu-

It would be great to have the following optional notation

info!("..."; a, b);

This could men the same as

info!("..."; "a" => a, "b" => b);

See also https://github.com/rust-lang/rfcs/pull/2173 for an RFC about "A quick dbg macro"

nielsle avatar Dec 29 '17 11:12 nielsle

BTW. Structured types have been added to slog behind a feature flag. https://docs.rs/slog/2.1.1/slog/trait.Serializer.html#method.emit_serde . I'm not sure if that's the best way to do it, but since people continuously asked about it, we agreed on some design and here it is.

dpc avatar Dec 29 '17 17:12 dpc