[feature] Context API

Open HoneyryderChuck opened this issue 4 months ago • 1 comments

The logger gem is notoriously simple to use, but hard to extend.

One can only observe a few of the gems that added tags / json / logstash formatting support to see the same functionality reimplemented in "same but different" ways. For example, logstash-logger and the activesupport logger handle tagging in virtually the same way. sidekiq has a (IMO) convoluted API to add per-job context to log message, which somewhat works as long as one does not need per-fiber contexts. Several other "log as JSON" gems fight with the same issues to deliver something that works, and either heavily patch the Logger or Logger::Formatter or circumvent its usage altogether.

At a company I worked for, an internal logging library, built on top of logger, even added support for context by allowing functions like #info to receive a hash (i.e. log.info(message: "foo", user_id: 2, ...)), something which was completely non-standard and couldn't possibly work with the log.info { "foo" } idiom.

I think that some of these patterns could be coalesced into the main logger gem in a way that extending it should be simpler than what it is right now.

For the record, the default formatter already supports per-message context, but that context is limited to a few "static" parameters (severity, current time, process id, progname) which are logged in a non-standard manner:

Logger.new(STDOUT).info("foo") 
#=> I, [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo

There is currently no way to add more parameters, or (if necessary) override the existing ones.

Context API

This is a proposal to extend the Logger API in a way that both per-scope or per-message context can be passed downstream to the formatter(s), and logged accordingly:

logger = Logger.new(STDOUT)
logger.info("foo") #=> I, [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo
logger.info { "foo" } #=> I, [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo

# per message context
logger.info("foo", user_id: 1) #=> I, [user_id=1] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo
logger.info(user_id: 1) { "foo" } #=> I, [user_id=1] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo
# or 
logger.info("foo", "user_id=1") #=> I, [user_id=1] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo

# per message context
logger.with_context(a: 1) do
  logger.info("foo") #=> I, [a=1] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo
end
logger.with_context(a: 1) do
  logger.with_context(b: 2) do
    logger.info("foo") #=> I, [a=1] [b=2] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo
  end
end

This API would achieve the same goal as the #tagged examples liked above, or the sidekiq Logger#with function, in a standard manner, where formatters would just receive them as a bunch of key-value pairs (or an array of "string-able" objects, both work for me).

I think that it builds on top of the earlier addition of Logger#with_level, which does the scope-based change in a similar way, but didn't go further than providing this functionality for other context variables (this proposal fixes that).

By standardizing this part, adding i.e. a JSON or logstash formatter would be way less "involved", as one of the limitations nowadays is that there's no way to pass this per-message context via its API (this is the main reason why the sidekiq formatter uses thread variables).

Drawbacks

One of the main benefits of using the log.info { "msg" } idiom is that the "msg" string never gets allocated if the logger severity level skips logging the message. In the current proposal, the context is always present, and will result in a few allocations bubbling downstream. I'm not sure how important or negligible this concern is though.

A concern here would be backwards compatibility, as currently, custom formatters do not expose a method signature which allows sending kwargs. Perhaps there's a way to introspect the formatter method to infer whether the signature supports kwargs, but even if it does, the check does have its cost.

Aug 13 '25 14:08 HoneyryderChuck

I'd like the basic Ruby Logger to have more features, but at the same time I'm wondering in which direction this feature is going in the long term.

Having listened to this talk and having read through the ActiveSupport::EventReporter API, that made me realize that "context" is only one aspect of logging. Shopify has kind of arrived at this set of information for logging (after many years of trial and error):

name: String (The name of the event)
payload: Hash, Object (The payload of the event, or the event object itself)
tags: Hash (The tags of the event)
context: Hash (The context of the event)
timestamp: Float (The timestamp of the event, in nanoseconds)
source_location: Hash (The source location of the event, containing the filepath, lineno, and label)

There is also a clear separation of concerns, using three different Ruby classes for each case:

In the code, you emit an event (with the information listed above)
A subscriber listens to those events - and may filter and transform the data
The subscriber distributes it to sinks (stdout logger, file logger, cloud json, UDP datagram, etc).

Shopify could be considered the high-end level of features (where basically no more features will be needed for a long time to cover all use cases, however, there are only two event log levels in ActiveSupport (debug and normal), which I think makes sense for that use case.)

And what we want here is more like:

"I want to log something quickly and using only basic Ruby, using the native logger, but I also want to send in some kwargs as tags/data/context, and it's enough to go to STDOUT or a file. I don't want the complexity of a log subscriber".

So I'm wondering how we can facilitate that. The Logger class is everything in one place: log call, subscriber, and sink.

Here is the implementation of the ActiveSupport eventing for inspiration (I'm not entirely sure why they use set_context without block but tagged takes a block.). It's just a few weeks old and seems grown out of a lot of thinking about it. But again, it's for events, not for a general logger – but the difference is not very big in my opinion, the event could be just "log". At this point I'm just wondering if I should simply use the ActiveSupport (and thus jump onto the bandwagon).

(The API they use allows both, arbitrary objects and kwargs to be passed in. I think that's smart, but I don't know what it could mean for a pure Logger instance to send in a random Object).

# ==== Arguments
#
# * +:payload+ - The event payload when using string/symbol event names.
#
# * +:caller_depth+ - The stack depth to use for source location (default: 1).
#
# * +:kwargs+ - Additional payload data when using string/symbol event names.
def notify(name_or_object, payload = nil, caller_depth: 1, **kwargs)

The proposed PR here has:

def add(severity, message = nil, progname = nil, context: nil)

(Given the first three parameters are fixed forever, I'm really not sure how to improve this.)

These were just some thought I had regarding the open PR. It's great to start with "set_context" (and get "context" and "clear_context"), but I'm not sure that's the same as "tags" and we do have "progname" (they don't) but we lack the line number of the calling code, we essentially use log levels to facilitate "log subscribing", they have separate concerns for that,...

I'm just a little bit afraid to break something here.

Good luck with the efforts! :rocket:

Sep 29 '25 10:09 halo