opentelemetry-python
opentelemetry-python copied to clipboard
Add log-formatting to the otel-LogHandler
Description
Added support for log message formatters to opentelemetry.sdk._logs._internal.LogHandler.
Fixes #3664 Fixes #3353
Type of change
- added a
self.formatcall on the originalLogRecordto the code in the LogHandler'semitmethod, the result is added as thebodyfield of the otel-LogRecord. - commented the code that filteres out all standard-
LogRecordfields from getting added to the otel-LogRecord. I identified three sections, 1) those that are considered "private" by theLogRecordAPI (see a note "You shouldn’t need to format this yourself" in https://docs.python.org/3/library/logging.html#logrecord-attributes) or only get used to cache results, 2) those that have a semantic convention and have their key translated accordingly, and 3) those that do not have a semantic convention (yet) and just get plain dropped.
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [x] This change requires a documentation update
The documentation should include an example which shows how exactly adding a formatter changes the result of what gets emitted to the receiver. I think there is some element of surprise here, because the standard procedure is that, if a formatter was set, the handler emits a string whose content is wholly dictated by said formatter.
The breaking change is due to me removing a check where a non-string msg attribute in the original LogRecord will be attached to the otel-LogRecord as-is. Now, the format call will always return a string, even if no formatter was set and args is empty. I believe this is the right behavior and makes the codebase easier to maintain. Code that looked like this
log.info({"message": "foo", "context": "bar"})
could, before this change, be interpreted as a complex data structure by some log-renderer. I'd argue that the "right way of logging" (with this example here illustrating a migration-path) should be the following:
log.info("foo", extra={"context": "bar"})
How Has This Been Tested?
I wrote three new unit tests, and updated a few other ones that broke due to the change.
Does This PR Require a Contrib Repo Change?
- [x] No
Checklist:
- [x] Followed the style guidelines of this project
- [x] Changelogs have been updated
- [x] Unit tests have been added
- [ ] Documentation has been updated
Breaking Change for the log-record's body-field
Type change
Important context here is the formal definition of the body-field in the otel data-model docs. Summed up, the key points are:
- the type is
any, so exporters need to support it as such - can be a human-readable string
- can be structured data, if that is what the application emits
The proposed patch would change the possible type of the emitted data to effectively be just string. This is not in conflict with the data-model, since string is a subtype of any. Exporters should still support any to be compliant with the spec, even if it would not occur in the reference implementation (i.e. users can of course choose to subclass otel's LoggingHandler and overwrite its emit method to write something else than a string into the emitted log-record's body-field).
Content Change
Given a logging-call like
logger = logging.getLogger("foo")
logger.info("bar")
Formerly, the content of the body field of the resulting record would always be just "foo". After this change, running logging.basicConfig would attach a formatter using BASIC_FORMAT which transform that message into "INFO:foo:bar" on the body field. If you'd want to keep the former behavior, you'd have to avoid using logging.basicConfig or explicitly switch its formatter with logging.Formatter("%(message)s") afterwards.
Excursion: What are the args in a log-record for?
If you take a look at the signature of Logger.log (which is what Logger.debug, Logger.info, etc. call with their respective level), there is an *args in there, which is also persisted in the LogRecord as an attribute.
Given a complex object such as
class ComplexObject:
def __str__(self):
import time
time.sleep(5)
return "zzz... Baz"
complex_object = ComplexObject()
its purpose is to safe time and effort on log-messages with expensive serialization operations to avoid constructs like
import logging
if logging.RootLogger.level <= logging.DEBUG:
logging.debug(f"Status of our complex object: {complex_object!s}")
where you don't want to spend 5 seconds waiting for complex_object to be turned into a string just to not end up emitting the result at all because your loglevel was higher than debug.
The same can be achieved with just
import logging
logging.debug(f"Status of our complex object: %s", complex_object)
And that's it, that's all that the *args in Logger.log should reasonably be used for. In practice, this rarely ever comes up.
Examples
We are going to use this script to test the behavior of the logging for the current main-branch (original), and what it would look like after applying this patch (patched):
otel_logging.py
import logging.config
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.sdk._logs.export import ConsoleLogExporter, SimpleLogRecordProcessor
from opentelemetry.sdk.resources import Resource
logger_provider = LoggerProvider(resource=Resource.create({}))
logger_provider.add_log_record_processor(SimpleLogRecordProcessor(ConsoleLogExporter()))
logging.config.dictConfig({
"version": 1,
"formatters": {
"simple": {"format": "%(name)s [%(module)s.%(lineno)s]: %(message)s"}
},
"handlers": {
"console": {
"level": "INFO",
"formatter": "simple",
"class": "opentelemetry.sdk._logs.LoggingHandler",
"logger_provider": logger_provider,
}
},
"root": {"handlers": ["console"], "level": "INFO"},
})
logging.info("foo")
logging.info("foo %s %s", "bar", "baz")
logging.info(["bar"])
logging.info({"baz": 1, "qux": None})
The output is quite verbose, so I'll only show the actual differences here, namely the content of the body-field.
original
"body": "foo",
"body": "foo bar baz",
"body": [
"bar"
],
"body": {
"baz": 1,
"qux": null
},
Note that the original type of the first call-argument is retained, string-formatting of the args took place, but the specified logging-formatter was not called. As can be seen in the last call, None has also been transformed into null by the otel processor/exporter, which will be picked up on in the next section.
patched
"body": "root [otel_logging.26]: foo",
"body": "root [otel_logging.27]: foo bar baz",
"body": "root [otel_logging.28]: ['bar']",
"body": "root [otel_logging.29]: {'baz': 1, 'qux': None}",
Note that all of these bodies are strings.
Justification of the change and discarded alternatives
Leave as-is
Python developers expect log-formatting to work. I acknowledge that as far as otel is concerned it is in a bit of an awkward space, since it could be considered a kind of "rendering" that should be handled in the frontend, and not within the client. But it's a staple of python's standard logging, there is an always-on default, and its absence caused a fair bit of confusion already (see linked issues).
Also, the current behavior is buggy for non-jsonable objects. If there were an additional logging event logging.info(set()), it would not appear in the output, and instead be quietly discarded.
Only apply log-formatting if the first argument is a string
This would keep the option open to emit any-type arguments, but would in my personal opinion be inconsistent behavior and cause confusion. It would share the bug of the current implementation with respect to non-jsonable objects, and the output would look like this:
mixed
"body": "root [otel_logging.26]: foo",
"body": "root [otel_logging.27]: foo bar baz",
"body": [
"bar"
],
"body": {
"baz": 1,
"qux": null
},
A possible issue here could be that the bodies lacking the format-information might fall through if dashboard-users are not aware of this discrepancy, e.g. "I know something went wrong in app.py, so I'll search for all bodies with *.\[app\.\d+\].*". Following the least astonishment principle and being consistent with how the body looks in general would be preferable in such a case.
Merge this PR
Log-formatting would work in a predictable, consistent, expected, and robust manner. We would lose the option to log structured data though, but at least for dictionaries there is a migration path. While sparingly documented (in the docs of Logger.debug, if you scroll a bit), you can pass dictionaries during logging with the extra-keyword and have its values persisted as log-record attributes:
logging.info("foo", extra={"bar": ["bar"], "baz": 1, "qux": None})
This is not new functionality, it was supported for a long time and would, as far as I can tell, provide a clean migration-path for users currently relying on passing structured data passed in log-events retaining a parseable shape:
"body": "root [otel_logging.26]: foo",
"attributes": {
"bar": [
"bar"
],
"baz": 1,
...,
},
Note that qux was filtered out though, since None is not a valid attribute-value as decided by the AttributeValue type and the _clean_attribute function applying it.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: a-recknagel / name: Arne (b0a5a8d287d8549c1214155a4beefc994b388998, 169931bbad01fd10a19597f1fc42f485bac8b5c4, 05525906286ae668a497796234ad61e9cffd24d1, 82993a3fb4ae08b880240aa32c7a288c71ace1c1, 772be5121c61cbb51e76dc8ae6b1b816e8ed6f14, a3ec2f0773797d03fd67df21536d958439097a6a, de1d49b8b91fe662f435dbf1f79e0609f8a87efc, c9b48cffbc93472d226f1835314a503ae6ea506e, b6ed8bfe20ebc44b5cadc06796cdf26306f78ac7, 943c9de2d83bdc07b6151cc3c8c4ff6e9cf4437e, eaf328d82e3580a96ef342c7b12c2641173bbd23, 222fb9b0ea9aecbff2fddf0464a7fb8b0fd6fb9f, 79ce99aea6ab00ccf8ab75bad5b6e2844de0b6ba, 3180964b6bbe1a26fbe0730ff91c3ca177576b81, 8ecd6a7375b6709eb296453ef430d9f42760dd57, 6ed248f313fb5c40842bb6b1231845dfab0f725f, 6ad9d1aec13089b6b1419e7bf97265e473839f26, bc2f8a2b91ff2157ba8ce3a1e3d88e0f2e19ed71, a01977269777893316ea132b981c73946b49540e, 2126fe8531ba59c429f9b187e8fa716f90d51517, 0b8c9c5ee9cfa8d60223193c13740c3be1e61bab)
- :white_check_mark: login: srikanthccv / name: Srikanth Chekuri (5b7b1956888b37be20aa42b65b4162312aafa185, 426abefc61a8cb7c3aa004fe9b9ac9837ee66bd0)
I think this is against the semantic conventions. To understand what I mean, see https://github.com/open-telemetry/opentelemetry-python/pull/3675 in otel we have standard semantic conventions for things like lineno and we should follow them and not add everything as python defined them.
@bogdandrutu Yeah it is. Scanning over SpanAttributes, there probably are a bunch more like thread.id. Is it ok if I just create an issue for it, something like "convert logrecord attribute into their correct semantic convention names"?
@bogdandrutu Yeah it is. Scanning over SpanAttributes, there probably are a bunch more like thread.id. Is it ok if I just create an issue for it, something like "convert logrecord attribute into their correct semantic convention names"?
It is ok to create an issue, but would be worried merging this PR since then it becomes a breaking change.
In that case I'll try to find all semantic conventions that exist for the standard logrecord attributes and add them with the right names at least. It might actually be just the code. and thread. ones.
Few that I have in mind:
- https://github.com/open-telemetry/semantic-conventions/blob/main/docs/exceptions/exceptions-logs.md
- https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md#source-code-attributes
- https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md#general-thread-attributes
Ok, I've updated my fork with your changes and suggestions.
@bogdandrutu @mariojonke if it's not a bother, I'd like to ask if the PR in its current state could be approved, or if there is something left that I should improve on. Thank you for your time.
Thank you all for contributing to the fix. Having formatting work properly is really important.
Could this change be reviewed and prioritized again?
@a-recknagel
Thanks for picking this up and apologies for the late review.
I'd argue that the "right way of logging" (even without the changes in this PR) would be the following:
Is the proposed change (doesn't seem like it is in this pr), to map those types to the extra field? How would this work for non-map data structures? Perhaps we can tackle the issue of handling "Any" type body's separate from log formatting. Will bring it up in the Python sig if we want to explore following this convention.
Not too familiar with format but how does self.format(record) handle when msg is for an example a dict? As long as those cases are covered by the formatter and there is a default behavior if it is erroneous I believe the breaking change is acceptable.
@a-recknagel
Thanks for picking this up and apologies for the late review.
No worries, I'm patient =)
I'd argue that the "right way of logging" (even without the changes in this PR) would be the following:
Is the proposed change (doesn't seem like it is in this pr), to map those types to the
extrafield?
Not 100% sure if I understand this question correctly, but this way of logging log.info("foo", extra={"context": "bar"}) is supported already, and will put a custom attribute context on an otel-logrecord which will get emitted. What I wanted to illustrate with that example was how to migrate away from this logging pattern log.info({"message": "foo", "context": "bar"}), since my change would break the current behavior of "the body of a log-record can be a string, list, dict, ..." to "the body of a log-record is a string."
In other words, if you log a dict to create a log-record-body of {"message": "foo", "context": "bar"}, what you should be doing instead is to just log a string "foo", have that be the body, and add context: "bar" as an attribute to the log-record by providing it to the extra keyword of your log-call.
How would this work for non-map data structures?
It wouldn't, if your log is a list/tuple/set, there's no migration path. I'd call into question if this is a real use case, but if it is then it'd certainly be an issue.
Perhaps we can tackle the issue of handling "Any" type body's separate from log formatting.
That would get a -1 if I were a maintainer, but I'm not.
Will bring it up in the Python sig if we want to explore following this convention.
Are those open? If yes, it might be easiest if I could join the one where this matter would be discussed.
Not too familiar with
formatbut how doesself.format(record)handle whenmsgis for an example adict? As long as those cases are covered by the formatter and there is a default behavior if it is erroneous I believe the breaking change is acceptable.
Depends on the log-record-class, a formatter will call their getMessage-method and treat the result as a string representing the record's message. For the default-log-record, it's just this.
Anyhow, the way I personally reason about the breaking change is that I'm not trying to provide a feature here, I'm fixing a bug. The python-implementation for otel-logging ties into python's standard-lib-logging-engine, which does include log-formatting as a de-facto enforced step. Skipping it in this logging-implementation was a bug, so adding it in is not a feature.
@a-recknagel
Are those open? If yes, it might be easiest if I could join the one where this matter would be discussed.
Yes! These are open to everyone. Check the opentelemetry calendar for the link to the Python SIG, we meet every Thurs 9am PST.
This week's instance will be cancelled due to the July 4th holiday but I've added your topic to the July 11th occurence the week after. You can see the SIG Notes here
@lzchen @xrmx I applied the requested changes, the only attributes that are by default on a logrecord are those that are covered by a semantic convention, i.e. thread.id, code.filepath, code.lineno, code.function, and thread.name.
Discussed in 07/11/2024 SIG. Today we have an exclude list of attributes (which essentially includes all of python log record attributes) that are ignored when mapping to Otel LogRecord attributes. We do recognize that the transformation of python logging record to Otel LogRecord is confusing for Python users (especially ones that are coming in to use formatters expecting certain attributes to exist). We believe this issue of discovery and transparency is a separate issue from the inability to allow formatting of logrecords. We can address the discovery issue separately (whether it be part of this PR or not is up to @a-recknagel :) or you can open up a separate tracking issue to maybe improve our documentation). The PR as is is valuable in itself without introducing any attribute changes. @a-recknagel if you agree with this guidance, please update the pr with any changes you think are necessary to reflect the conclusion and let people know when it is ready to review. Thanks for your hard work!
@lzchen now that I read the pr again, I guess we should have also discussed the matter of the breaking change, my bad. Should it be picked up in the next SIG again? That would give me a bit of time to write the docs, too.
@a-recknagel
Yes of course! Feel free to add your topic to the agenda for the Python. Thanks a lot for your patience and hard work on this issue :)
"Fixes 3673" This PR says it fixes itself. Is there a second issue that was supposed to be linked here besides 3664?
"Fixes 3673" This PR says it fixes itself. Is there a second issue that was supposed to be linked here besides 3664?
I don't know why it's there, it's not part of the issue text so I can't remove it.