opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

Add log-formatting to the otel-LogHandler

Open a-recknagel opened this issue 1 year ago • 20 comments

Description

Added support for log message formatters to opentelemetry.sdk._logs._internal.LogHandler.

Fixes #3664 Fixes #3353

Type of change

  • added a self.format call on the original LogRecord to the code in the LogHandler's emit method, the result is added as the body field of the otel-LogRecord.
  • commented the code that filteres out all standard-LogRecord fields from getting added to the otel-LogRecord. I identified three sections, 1) those that are considered "private" by the LogRecord API (see a note "You shouldn’t need to format this yourself" in https://docs.python.org/3/library/logging.html#logrecord-attributes) or only get used to cache results, 2) those that have a semantic convention and have their key translated accordingly, and 3) those that do not have a semantic convention (yet) and just get plain dropped.
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [x] This change requires a documentation update

The documentation should include an example which shows how exactly adding a formatter changes the result of what gets emitted to the receiver. I think there is some element of surprise here, because the standard procedure is that, if a formatter was set, the handler emits a string whose content is wholly dictated by said formatter.

The breaking change is due to me removing a check where a non-string msg attribute in the original LogRecord will be attached to the otel-LogRecord as-is. Now, the format call will always return a string, even if no formatter was set and args is empty. I believe this is the right behavior and makes the codebase easier to maintain. Code that looked like this

log.info({"message": "foo", "context": "bar"})

could, before this change, be interpreted as a complex data structure by some log-renderer. I'd argue that the "right way of logging" (with this example here illustrating a migration-path) should be the following:

log.info("foo", extra={"context": "bar"})

How Has This Been Tested?

I wrote three new unit tests, and updated a few other ones that broke due to the change.

Does This PR Require a Contrib Repo Change?

  • [x] No

Checklist:

  • [x] Followed the style guidelines of this project
  • [x] Changelogs have been updated
  • [x] Unit tests have been added
  • [ ] Documentation has been updated

Breaking Change for the log-record's body-field

Type change

Important context here is the formal definition of the body-field in the otel data-model docs. Summed up, the key points are:

  • the type is any, so exporters need to support it as such
  • can be a human-readable string
  • can be structured data, if that is what the application emits

The proposed patch would change the possible type of the emitted data to effectively be just string. This is not in conflict with the data-model, since string is a subtype of any. Exporters should still support any to be compliant with the spec, even if it would not occur in the reference implementation (i.e. users can of course choose to subclass otel's LoggingHandler and overwrite its emit method to write something else than a string into the emitted log-record's body-field).

Content Change

Given a logging-call like

logger = logging.getLogger("foo")
logger.info("bar")

Formerly, the content of the body field of the resulting record would always be just "foo". After this change, running logging.basicConfig would attach a formatter using BASIC_FORMAT which transform that message into "INFO:foo:bar" on the body field. If you'd want to keep the former behavior, you'd have to avoid using logging.basicConfig or explicitly switch its formatter with logging.Formatter("%(message)s") afterwards.

Excursion: What are the args in a log-record for?

If you take a look at the signature of Logger.log (which is what Logger.debug, Logger.info, etc. call with their respective level), there is an *args in there, which is also persisted in the LogRecord as an attribute.

Given a complex object such as

class ComplexObject:
    def __str__(self):
        import time
        time.sleep(5)
        return "zzz... Baz"

complex_object = ComplexObject()

its purpose is to safe time and effort on log-messages with expensive serialization operations to avoid constructs like

import logging

if logging.RootLogger.level <= logging.DEBUG:
    logging.debug(f"Status of our complex object: {complex_object!s}")

where you don't want to spend 5 seconds waiting for complex_object to be turned into a string just to not end up emitting the result at all because your loglevel was higher than debug.

The same can be achieved with just

import logging

logging.debug(f"Status of our complex object: %s", complex_object)

And that's it, that's all that the *args in Logger.log should reasonably be used for. In practice, this rarely ever comes up.

Examples

We are going to use this script to test the behavior of the logging for the current main-branch (original), and what it would look like after applying this patch (patched):

otel_logging.py

import logging.config

from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._logs.export import ConsoleLogExporter, SimpleLogRecordProcessor
from opentelemetry.sdk.resources import Resource

logger_provider = LoggerProvider(resource=Resource.create({}))
logger_provider.add_log_record_processor(SimpleLogRecordProcessor(ConsoleLogExporter()))

logging.config.dictConfig({
    "version": 1,
    "formatters": {
        "simple": {"format": "%(name)s [%(module)s.%(lineno)s]: %(message)s"}
    },
    "handlers": {
        "console": {
            "level": "INFO",
            "formatter": "simple",
            "class": "opentelemetry.sdk._logs.LoggingHandler",
            "logger_provider": logger_provider,
        }
    },
    "root": {"handlers": ["console"], "level": "INFO"},
})

logging.info("foo")
logging.info("foo %s %s", "bar", "baz")
logging.info(["bar"])
logging.info({"baz": 1, "qux": None})

The output is quite verbose, so I'll only show the actual differences here, namely the content of the body-field.

original

"body": "foo",
"body": "foo bar baz",
"body": [
    "bar"
],
"body": {
    "baz": 1,
    "qux": null
},

Note that the original type of the first call-argument is retained, string-formatting of the args took place, but the specified logging-formatter was not called. As can be seen in the last call, None has also been transformed into null by the otel processor/exporter, which will be picked up on in the next section.

patched

"body": "root [otel_logging.26]: foo",
"body": "root [otel_logging.27]: foo bar baz",
"body": "root [otel_logging.28]: ['bar']",
"body": "root [otel_logging.29]: {'baz': 1, 'qux': None}",

Note that all of these bodies are strings.

Justification of the change and discarded alternatives

Leave as-is

Python developers expect log-formatting to work. I acknowledge that as far as otel is concerned it is in a bit of an awkward space, since it could be considered a kind of "rendering" that should be handled in the frontend, and not within the client. But it's a staple of python's standard logging, there is an always-on default, and its absence caused a fair bit of confusion already (see linked issues).

Also, the current behavior is buggy for non-jsonable objects. If there were an additional logging event logging.info(set()), it would not appear in the output, and instead be quietly discarded.

Only apply log-formatting if the first argument is a string

This would keep the option open to emit any-type arguments, but would in my personal opinion be inconsistent behavior and cause confusion. It would share the bug of the current implementation with respect to non-jsonable objects, and the output would look like this:

mixed

"body": "root [otel_logging.26]: foo",
"body": "root [otel_logging.27]: foo bar baz",
"body": [
    "bar"
],
"body": {
    "baz": 1,
    "qux": null
},

A possible issue here could be that the bodies lacking the format-information might fall through if dashboard-users are not aware of this discrepancy, e.g. "I know something went wrong in app.py, so I'll search for all bodies with *.\[app\.\d+\].*". Following the least astonishment principle and being consistent with how the body looks in general would be preferable in such a case.

Merge this PR

Log-formatting would work in a predictable, consistent, expected, and robust manner. We would lose the option to log structured data though, but at least for dictionaries there is a migration path. While sparingly documented (in the docs of Logger.debug, if you scroll a bit), you can pass dictionaries during logging with the extra-keyword and have its values persisted as log-record attributes:

logging.info("foo", extra={"bar": ["bar"], "baz": 1, "qux": None})

This is not new functionality, it was supported for a long time and would, as far as I can tell, provide a clean migration-path for users currently relying on passing structured data passed in log-events retaining a parseable shape:

"body": "root [otel_logging.26]: foo",
"attributes": {
    "bar": [
        "bar"
    ],
    "baz": 1,
    ...,
},

Note that qux was filtered out though, since None is not a valid attribute-value as decided by the AttributeValue type and the _clean_attribute function applying it.

a-recknagel avatar Feb 05 '24 12:02 a-recknagel

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: a-recknagel / name: Arne (b0a5a8d287d8549c1214155a4beefc994b388998, 169931bbad01fd10a19597f1fc42f485bac8b5c4, 05525906286ae668a497796234ad61e9cffd24d1, 82993a3fb4ae08b880240aa32c7a288c71ace1c1, 772be5121c61cbb51e76dc8ae6b1b816e8ed6f14, a3ec2f0773797d03fd67df21536d958439097a6a, de1d49b8b91fe662f435dbf1f79e0609f8a87efc, c9b48cffbc93472d226f1835314a503ae6ea506e, b6ed8bfe20ebc44b5cadc06796cdf26306f78ac7, 943c9de2d83bdc07b6151cc3c8c4ff6e9cf4437e, eaf328d82e3580a96ef342c7b12c2641173bbd23, 222fb9b0ea9aecbff2fddf0464a7fb8b0fd6fb9f, 79ce99aea6ab00ccf8ab75bad5b6e2844de0b6ba, 3180964b6bbe1a26fbe0730ff91c3ca177576b81, 8ecd6a7375b6709eb296453ef430d9f42760dd57, 6ed248f313fb5c40842bb6b1231845dfab0f725f, 6ad9d1aec13089b6b1419e7bf97265e473839f26, bc2f8a2b91ff2157ba8ce3a1e3d88e0f2e19ed71, a01977269777893316ea132b981c73946b49540e, 2126fe8531ba59c429f9b187e8fa716f90d51517, 0b8c9c5ee9cfa8d60223193c13740c3be1e61bab)
  • :white_check_mark: login: srikanthccv / name: Srikanth Chekuri (5b7b1956888b37be20aa42b65b4162312aafa185, 426abefc61a8cb7c3aa004fe9b9ac9837ee66bd0)

I think this is against the semantic conventions. To understand what I mean, see https://github.com/open-telemetry/opentelemetry-python/pull/3675 in otel we have standard semantic conventions for things like lineno and we should follow them and not add everything as python defined them.

bogdandrutu avatar Feb 05 '24 20:02 bogdandrutu

@bogdandrutu Yeah it is. Scanning over SpanAttributes, there probably are a bunch more like thread.id. Is it ok if I just create an issue for it, something like "convert logrecord attribute into their correct semantic convention names"?

a-recknagel avatar Feb 05 '24 21:02 a-recknagel

@bogdandrutu Yeah it is. Scanning over SpanAttributes, there probably are a bunch more like thread.id. Is it ok if I just create an issue for it, something like "convert logrecord attribute into their correct semantic convention names"?

It is ok to create an issue, but would be worried merging this PR since then it becomes a breaking change.

bogdandrutu avatar Feb 05 '24 21:02 bogdandrutu

In that case I'll try to find all semantic conventions that exist for the standard logrecord attributes and add them with the right names at least. It might actually be just the code. and thread. ones.

a-recknagel avatar Feb 05 '24 22:02 a-recknagel

Few that I have in mind:

  • https://github.com/open-telemetry/semantic-conventions/blob/main/docs/exceptions/exceptions-logs.md
  • https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md#source-code-attributes
  • https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md#general-thread-attributes

bogdandrutu avatar Feb 06 '24 17:02 bogdandrutu

Ok, I've updated my fork with your changes and suggestions.

a-recknagel avatar Feb 08 '24 08:02 a-recknagel

@bogdandrutu @mariojonke if it's not a bother, I'd like to ask if the PR in its current state could be approved, or if there is something left that I should improve on. Thank you for your time.

a-recknagel avatar Mar 12 '24 08:03 a-recknagel

Thank you all for contributing to the fix. Having formatting work properly is really important.

Could this change be reviewed and prioritized again?

amurauyou avatar May 20 '24 20:05 amurauyou

@a-recknagel

Thanks for picking this up and apologies for the late review.

I'd argue that the "right way of logging" (even without the changes in this PR) would be the following:

Is the proposed change (doesn't seem like it is in this pr), to map those types to the extra field? How would this work for non-map data structures? Perhaps we can tackle the issue of handling "Any" type body's separate from log formatting. Will bring it up in the Python sig if we want to explore following this convention.

Not too familiar with format but how does self.format(record) handle when msg is for an example a dict? As long as those cases are covered by the formatter and there is a default behavior if it is erroneous I believe the breaking change is acceptable.

lzchen avatar Jun 27 '24 22:06 lzchen

@a-recknagel

Thanks for picking this up and apologies for the late review.

No worries, I'm patient =)

I'd argue that the "right way of logging" (even without the changes in this PR) would be the following:

Is the proposed change (doesn't seem like it is in this pr), to map those types to the extra field?

Not 100% sure if I understand this question correctly, but this way of logging log.info("foo", extra={"context": "bar"}) is supported already, and will put a custom attribute context on an otel-logrecord which will get emitted. What I wanted to illustrate with that example was how to migrate away from this logging pattern log.info({"message": "foo", "context": "bar"}), since my change would break the current behavior of "the body of a log-record can be a string, list, dict, ..." to "the body of a log-record is a string."

In other words, if you log a dict to create a log-record-body of {"message": "foo", "context": "bar"}, what you should be doing instead is to just log a string "foo", have that be the body, and add context: "bar" as an attribute to the log-record by providing it to the extra keyword of your log-call.

How would this work for non-map data structures?

It wouldn't, if your log is a list/tuple/set, there's no migration path. I'd call into question if this is a real use case, but if it is then it'd certainly be an issue.

Perhaps we can tackle the issue of handling "Any" type body's separate from log formatting.

That would get a -1 if I were a maintainer, but I'm not.

Will bring it up in the Python sig if we want to explore following this convention.

Are those open? If yes, it might be easiest if I could join the one where this matter would be discussed.

Not too familiar with format but how does self.format(record) handle when msg is for an example a dict? As long as those cases are covered by the formatter and there is a default behavior if it is erroneous I believe the breaking change is acceptable.

Depends on the log-record-class, a formatter will call their getMessage-method and treat the result as a string representing the record's message. For the default-log-record, it's just this.


Anyhow, the way I personally reason about the breaking change is that I'm not trying to provide a feature here, I'm fixing a bug. The python-implementation for otel-logging ties into python's standard-lib-logging-engine, which does include log-formatting as a de-facto enforced step. Skipping it in this logging-implementation was a bug, so adding it in is not a feature.

a-recknagel avatar Jun 30 '24 20:06 a-recknagel

@a-recknagel

Are those open? If yes, it might be easiest if I could join the one where this matter would be discussed.

Yes! These are open to everyone. Check the opentelemetry calendar for the link to the Python SIG, we meet every Thurs 9am PST.

This week's instance will be cancelled due to the July 4th holiday but I've added your topic to the July 11th occurence the week after. You can see the SIG Notes here

lzchen avatar Jul 01 '24 17:07 lzchen

@lzchen @xrmx I applied the requested changes, the only attributes that are by default on a logrecord are those that are covered by a semantic convention, i.e. thread.id, code.filepath, code.lineno, code.function, and thread.name.

a-recknagel avatar Jul 04 '24 09:07 a-recknagel

Discussed in 07/11/2024 SIG. Today we have an exclude list of attributes (which essentially includes all of python log record attributes) that are ignored when mapping to Otel LogRecord attributes. We do recognize that the transformation of python logging record to Otel LogRecord is confusing for Python users (especially ones that are coming in to use formatters expecting certain attributes to exist). We believe this issue of discovery and transparency is a separate issue from the inability to allow formatting of logrecords. We can address the discovery issue separately (whether it be part of this PR or not is up to @a-recknagel :) or you can open up a separate tracking issue to maybe improve our documentation). The PR as is is valuable in itself without introducing any attribute changes. @a-recknagel if you agree with this guidance, please update the pr with any changes you think are necessary to reflect the conclusion and let people know when it is ready to review. Thanks for your hard work!

lzchen avatar Jul 11 '24 17:07 lzchen

@lzchen now that I read the pr again, I guess we should have also discussed the matter of the breaking change, my bad. Should it be picked up in the next SIG again? That would give me a bit of time to write the docs, too.

a-recknagel avatar Jul 11 '24 17:07 a-recknagel

@a-recknagel

Yes of course! Feel free to add your topic to the agenda for the Python. Thanks a lot for your patience and hard work on this issue :)

lzchen avatar Jul 11 '24 19:07 lzchen

"Fixes 3673" This PR says it fixes itself. Is there a second issue that was supposed to be linked here besides 3664?

jeremydvoss avatar Jul 18 '24 16:07 jeremydvoss

"Fixes 3673" This PR says it fixes itself. Is there a second issue that was supposed to be linked here besides 3664?

I don't know why it's there, it's not part of the issue text so I can't remove it.

a-recknagel avatar Jul 18 '24 17:07 a-recknagel