opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

Handle `extra` attributes of unknown classes

Open epa095 opened this issue 10 months ago • 2 comments

Is your feature request related to a problem?

We use the "extra" field in the python logger to add extra context. Sometimes the objects are rich objects, e.g dataclasses. With the regular python logger this works reasonably well, while with this logger we get the errror Invalid type MyNotDataclass for attribute 'who' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types and the extra value is dropped.

Here is an simple example:

import logging
import dataclasses

logging.basicConfig(level=logging.DEBUG, format="%(levelname)s: %(message)s. Who: %(who)s")

LOGGER = logging.getLogger(__name__)

@dataclasses.dataclass  
class MyDataclass:
    name: str
    age: int

LOGGER.warning("I did something", extra={"who": MyDataclass(name="John", age=30)})

This would print WARNING: I did something. Who: MyDataclass(name='John', age=30)

And if you used e.g. https://github.com/nhairs/python-json-logger you would get :

{"message": "I did something", "who": {"name": "John", "age": 30}}

With auto-instrumentation we get the error Invalid type MyDataclass for attribute 'who' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types.

This is because of the handling in _clean_attribute where there is a small list of special cases handled, but in the generall case extra-values are thrown away.

(This is of course all a problem because the python logging module don't expose the extras in any way, and instead makeRecord injects them directly into the __dict__ the LogRecord. So there is really no nice way to process. But that is a rant for another time)

Describe the solution you'd like

In a perfect world any extra parameter which could be serialized as json would be, so the console output would be something like:

{
    "body": "I did something",
    "severity_number": 13,
    "severity_text": "WARN",
    "attributes": {
        "otelSpanID": "0",
        "who": {"name": "John", "age": 30}
...

But I imagine that this quickly gets dirty, and not something you want in here(?)

Maybe it could be possible to register some function which would be passed the attributes and which are free to return whatever?

Alternatively, and maybe less controversial is to call str) on the remaining attributes. Then it becomes something like:

{
    "body": "I did something",
    "severity_number": 13,
    "severity_text": "WARN",
    "attributes": {
        "otelSpanID": "0",
        "who": "MyDataclass(name='John', age=30)",
...

This is less usefull for further analysis, but at least the extra is not completely lost.

Describe alternatives you've considered

I have made a LoggerAdapter

import logging
from pythonjsonlogger.json import JsonFormatter


class ExtraLoggerAdapter(logging.LoggerAdapter):
    def __init__(self, logger, extra=None, jsonify=True):

        self.logger = logger
        self.extra = extra
        self.jsonify = jsonify
        if self.jsonify:
            self.jsonformater = JsonFormatter()

    def process(self, msg, kwargs):
        kwargs["extra"] = (self.extra or {}) | kwargs.get("extra", {})
        if self.jsonify:
            kwargs["extra"] = {
                k: self.jsonformater.jsonify_log_record(v)
                for k, v in kwargs["extra"].items()
            }
        else:
            kwargs["extra"] = {k: str(v) for k, v in kwargs["extra"].items()}

        return msg, kwargs 

But this must be used explicitly everywere things are logged, like logger= ExtraLoggerAdapter(logging.getLogger(__name__), jsonify=True). And the inner json gets escaped:

{
    "body": "I did something",
    "severity_number": 13,
    "severity_text": "WARN",
    "attributes": {
        "who": "{\"name\": \"John\", \"age\": 30}",
...

Additional Context

No response

Would you like to implement a fix?

None

epa095 avatar Feb 07 '25 21:02 epa095

https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-api/src/opentelemetry/attributes/init.py#L100

Hi everyone!

I'm running into an issue while logging requests in my Django API using OpenTelemetry. Every time I log a request, I see the following warning:

"Invalid type socket for attribute 'request' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types."

To work around this, I added a log filter to avoid this specific warning, but I wanted to check if this is the expected behavior.

Is there a recommended way to handle this, or would a fix be needed? I'd be happy to contribute if changes are required!

Thanks in advance for any guidance!

hudsongeovane avatar Feb 10 '25 23:02 hudsongeovane

I see the same warning happen when an instrumented celery worker tries to log task completions using logger.info(fmt, context, extra={'data': context})

[INFO] in [celery.app.trace 1816422 133972055703552]: Task mytaskname[42c5a639-16c5-48ca-832b-51fc2e1f0f40] succeeded in 0.05935675499495119s: None
[WARNING] in [opentelemetry.attributes 1816422 133972055703552]: Invalid type dict for attribute 'data' value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types

If I poke into data, its {'id': '42c5a639-16c5-48ca-832b-51fc2e1f0f40', 'name': 'mytaskname', 'return_value': 'None', 'runtime': 0.05935675499495119, 'args': '()', 'kwargs': '{}'} which is the celery log entry emitted by https://github.com/celery/celery/blob/5c1a13cffe0331391bf8bc808196a1573f8922ad/celery/app/trace.py#L533

I don't particularly care about the extra data at the moment, so I'll probably try the log filter approach mentioned above, but I wanted to provide another use case where _clean_attribute can fail.

devmonkey22 avatar May 13 '25 21:05 devmonkey22