structlog icon indicating copy to clipboard operation
structlog copied to clipboard

`to_repr` and `rich` breaks logging

Open Eldeeqq opened this issue 1 year ago • 14 comments

Structlog version: 24.4.0

⭐ First of all - love this project, huge fan, good job.

Hello,

Maybe this is just bad setup on our end, but it was tricky to figure out so I decided to raise it.

We use structlog for logging data from our service in json format, and then uploading it into DataDog. After updating structlog to version 24.4.0 our logging started to break each time we logged an exception ( structlog.get_logger().exception(...))

I was able to trace the root cause to this PR: https://github.com/hynek/structlog/pull/627, where the implementation for to_repr changed to:

def to_repr(
    obj: Any,
    max_length: int | None = None,
    max_string: int | None = None,
    use_rich: bool = True, # source of the problem
) -> str:

and I was able to hotfix the the problem on our end by creating our own dict_tracebacks and setting use_rich explicitly.

custom_dict_tracebacks =  ExceptionRenderer(ExceptionDictTransformer(use_rich=False))
Stack Trace
File "/usr/local/lib/python3.12/site-packages/structlog/_base.py", line 165, in _process_event

event_dict = proc(self._logger, method_name, event_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/site-packages/structlog/processors.py", line 412, in __call__

event_dict["exception"] = self.format_exception(
^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/site-packages/structlog/tracebacks.py", line 415, in __call__

trace = extract(
^^^^^^^^

File "/usr/local/lib/python3.12/site-packages/structlog/tracebacks.py", line 276, in extract

key: to_repr(
^^^^^^^^

File "/usr/local/lib/python3.12/site-packages/structlog/tracebacks.py", line 147, in to_repr

obj_repr = rich.pretty.traverse(
^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/site-packages/rich/pretty.py", line 853, in traverse

node = _traverse(_object, root=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/usr/local/lib/python3.12/site-packages/rich/pretty.py", line 767, in _traverse

child_node = _traverse(getattr(obj, field.name), depth=depth + 1)
^^^^^^^^^^^^^^^^^^^^^^^^

AttributeError: 'MemoryObjectItemReceiver' object has no attribute 'item'

Since we don't install rich on the server, I'm not really sure how it was able to import it in the first place (my guess is transitive dependency), or why it failed.

But would it make sense to somehow configure the use_rich flag globally?

Eldeeqq avatar Sep 25 '24 13:09 Eldeeqq

Do you install bandit? Because that has rich as a dependency (just because they have a one-liner where they wanted to show a loading bar 🥲)

hackermandh avatar Nov 20 '24 22:11 hackermandh

I checked which modules are installed on the server and neither bandit or rich are being installed. Might've been different when I created this issue 🙁

Eldeeqq avatar Nov 28 '24 13:11 Eldeeqq

⭐ First of all - love this project, huge fan, good job.

Thank you! 💛

We use structlog for logging data from our service in json format, and then uploading it into DataDog. After updating structlog to version 24.4.0 our logging started to break each time we logged an exception ( structlog.get_logger().exception(...))

What you mean here, is that you expect single-line JSON lines and Rich formating breaks them into multiple ones? (is this expected @sscherfke?)

I was able to trace the root cause to this PR: #627, where the implementation for to_repr changed to:

def to_repr(
    obj: Any,
    max_length: int | None = None,
    max_string: int | None = None,
    use_rich: bool = True, # source of the problem
) -> str:

and I was able to hotfix the the problem on our end by creating our own dict_tracebacks and setting use_rich explicitly.

custom_dict_tracebacks =  ExceptionRenderer(ExceptionDictTransformer(use_rich=False))

I don't think that's a hot fix – I think you're using configuration as it's intended? 😅

Since we don't install rich on the server, I'm not really sure how it was able to import it in the first place (my guess is transitive dependency), or why it failed.

Not to be lecturing, but you should use recursive lock files to make this deterministic. 🤓

But would it make sense to somehow configure the use_rich flag globally?

We try to avoid having anything global except the configuration and you were able to work around the problem using regular config, so I'm gonna say everything is working as intended?

Depending on what Stefan says, we can have a look at multi-line log lines or whatever is happening there, but adding a global variable to avoid configuration is not our MO. :)

hynek avatar Nov 30 '24 06:11 hynek

We use structlog for logging data from our service in json format, and then uploading it into DataDog. After updating structlog to version 24.4.0 our logging started to break each time we logged an exception ( structlog.get_logger().exception(...))

What you mean here, is that you expect single-line JSON lines and Rich formating breaks them into multiple ones? (is this expected @sscherfke?)

Not entirely. Yes, we expect inline JSON. But whenever we logged an exception, another exception was raised while logging it (I'm not sure about the exact trace content since our logs get removed after some time 🙁). This second exception was raised from the to_repr function and wasn't formatted the same way as the others. I suspect this happened because we didn't have sys.excepthook set up properly at the time.

I double-checked the dependencies on the server, and rich is indeed not installed. My hunch is that structlog tried to use rich despite being unable to import it.

I should have explained that better, sorry.

I was able to trace the root cause to this PR: #627, where the implementation for to_repr changed to:

def to_repr(
    obj: Any,
    max_length: int | None = None,
    max_string: int | None = None,
    use_rich: bool = True, # source of the problem
) -> str:

and I was able to hotfix the the problem on our end by creating our own dict_tracebacks and setting use_rich explicitly.

custom_dict_tracebacks =  ExceptionRenderer(ExceptionDictTransformer(use_rich=False))

I don't think that's a hot fix – I think you're using configuration as it's intended? 😅

Yeah, I guess I meant hotfixing the logging within our service by configuring structlog differently 😁

Since we don't install rich on the server, I'm not really sure how it was able to import it in the first place (my guess is transitive dependency), or why it failed.

Not to be lecturing, but you should use recursive lock files to make this deterministic. 🤓

I agree. I'm not OG dev on that project and I'm currently pushing for using uv/poetry.

But would it make sense to somehow configure the use_rich flag globally?

We try to avoid having anything global except the configuration and you were able to work around the problem using regular config, so I'm gonna say everything is working as intended?

I understand. Yes, I managed to configure it correctly. However, as I mentioned at the start - I think the problem was caused by structlog attempting to use rich when it wasn't installed. I haven't investigated thoroughly, but there might be a missing dependency check for rich in structlog.

Depending on what Stefan says, we can have a look at multi-line log lines or whatever is happening there, but adding a global variable to avoid configuration is not our MO. :)

Sure thing, thanks for your time!

Eldeeqq avatar Dec 02 '24 09:12 Eldeeqq

rich should only be used for generating a nicer repr when it is actually available. There shouldn't be any errors caused by rich not being installed.

I'll take a look at it. A (cleaned) traceback from your logs would be really helpful though. Maybe you can reactivate the old behavior on a test system until you get the desired exception?

sscherfke avatar Dec 02 '24 09:12 sscherfke

Sure thing! I'll try to reproduce the bug and share the trace :)

Eldeeqq avatar Dec 02 '24 16:12 Eldeeqq

This looks like #678 – Rich does not handle exceptions raised during traceback rendering. This applies to the ConsoleRenderer as well.

Something like this might be a fix:

diff --git a/src/structlog/tracebacks.py b/src/structlog/tracebacks.py
index 2331b2f..2607fc3 100644
--- a/src/structlog/tracebacks.py
+++ b/src/structlog/tracebacks.py
@@ -142,29 +142,32 @@ def to_repr(
        implementation.
     """
     if use_rich and rich is not None:
-        # Let rich render the repr if it is available.
-        # It produces much better results for containers and dataclasses/attrs.
-        obj_repr = rich.pretty.traverse(
-            obj, max_length=max_length, max_string=max_string
-        ).render()
-    else:
-        # Generate a (truncated) repr if rich is not available.
-        # Handle str/bytes differently to get better results for truncated
-        # representations.  Also catch all errors, similarly to "safe_str()".
-        try:
-            if isinstance(obj, (str, bytes)):
-                if max_string is not None and len(obj) > max_string:
-                    truncated = len(obj) - max_string
-                    obj_repr = f"{obj[:max_string]!r}+{truncated}"
-                else:
-                    obj_repr = repr(obj)
-            else:
-                obj_repr = repr(obj)
-                if max_string is not None and len(obj_repr) > max_string:
-                    truncated = len(obj_repr) - max_string
-                    obj_repr = f"{obj_repr[:max_string]!r}+{truncated}"
-        except Exception as error:  # noqa: BLE001
-            obj_repr = f"<repr-error {str(error)!r}>"
+        try:
+            # Let rich render the repr if it is available.
+            # It produces much better results for containers and dataclasses/attrs.
+            return rich.pretty.traverse(
+                obj, max_length=max_length, max_string=max_string
+            ).render()
+        except Exception:
+            pass
+    # Generate a (truncated) repr if rich is not available.
+    # Handle str/bytes differently to get better results for truncated
+    # representations.  Also catch all errors, similarly to "safe_str()".
+    try:
+        if isinstance(obj, (str, bytes)):
+            if max_string is not None and len(obj) > max_string:
+                truncated = len(obj) - max_string
+                obj_repr = f"{obj[:max_string]!r}+{truncated}"
+            else:
+                obj_repr = repr(obj)
+        else:
+            obj_repr = repr(obj)
+            if max_string is not None and len(obj_repr) > max_string:
+                truncated = len(obj_repr) - max_string
+                obj_repr = f"{obj_repr[:max_string]!r}+{truncated}"
+    except Exception as error:  # noqa: BLE001
+        obj_repr = f"<repr-error {str(error)!r}>"
 
     return obj_repr

sscherfke avatar Dec 06 '24 22:12 sscherfke

So, the idea is to render it ourselves if Rich explodes?

hynek avatar Jan 04 '25 11:01 hynek

Yes, similar to the safe_str() function: https://github.com/hynek/structlog/blob/main/src/structlog/tracebacks.py#L104-L109

sscherfke avatar Jan 04 '25 11:01 sscherfke

So safe_repr just announced? 😅

as you might’ve noticed I’m fighting myself currently thru the backlog. Given we’ve had boneless than two (maybe three) reports in this vein, I thinking would be good to avoid it for 25.1.0.

hynek avatar Jan 04 '25 12:01 hynek

The custom to_repr hook that is used when Rich is not available already has a similar safeguard, but it is not used in case Rich is used (and fails): https://github.com/hynek/structlog/blob/main/src/structlog/tracebacks.py#L169

sscherfke avatar Jan 04 '25 12:01 sscherfke

Looked at the code again. I think that we can fix this relatively easy for the dict transformer (for json logging). But fixing it for the console renderer is not that easy, because we basically do a print(rich.traceback.Traceback.from_exception(exc)), so there is no way to add a safe guard around their to_repr().

The probability that JSON logging is used in someones prod env is relatively high so imho we should fix this use case. If any local object has a broken __repr__() implementation, this would break the exception log which would make debugging exceptions in a prod system rather hard. ;-)

sscherfke avatar Jan 05 '25 11:01 sscherfke

Looked at the code again. I think that we can fix this relatively easy for the dict transformer (for json logging). But fixing it for the console renderer is not that easy, because we basically do a print(rich.traceback.Traceback.from_exception(exc)), so there is no way to add a safe guard around their to_repr().

Falling back to a simple formatter in dev would be better than a crashing I guess?

hynek avatar Jan 05 '25 12:01 hynek

Tried to write a regression test for this and realized that rich already has a safe guard for erroneous __repr__() implementations.

And the actual error occurs when rich traverses the objects graph. In this case, it tries to iterate over the fields of a dataclass: https://github.com/Textualize/rich/blob/v13.9.4/rich/pretty.py#L788

For some reasons (that I didn't look up), the MemoryObjectItemReceiver instance has no attribute item even though the fields() function returned a field with that name.

So it is definitely a rich issue and we can't really to anything about it.

sscherfke avatar Jan 05 '25 16:01 sscherfke