structlog-gcp icon indicating copy to clipboard operation
structlog-gcp copied to clipboard

LogFieldSanitizer for BigQuery

Open petemounce opened this issue 10 months ago • 1 comments

I can't recall whether I mentioned this, but at some point I ran into errors when I threw something into a log-field that was a list of lists. The following works around it in my particular setup:

"""Sanitize log-fields for backends"""

import structlog
from structlog.types import EventDict, Processor


class LogFieldSanitizer:
    """
    Google Logging can back onto Log Sinks, which in turn are stored in BigQuery.
    BigQuery has at least one limitation; it cannot store lists of lists.
    Reference: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type

    This structlog processor adjusts for that if a log-field is a list of lists.
    """

    def setup(self) -> list[Processor]:
        return [self]

    def __call__(self, logger: structlog.typing.WrappedLogger, method_name: str, event_dict: EventDict) -> EventDict:
        del logger, method_name  # unused

        for key in event_dict:
            if isinstance(event_dict[key], list):
                orig = event_dict[key]
                if any(isinstance(x, list) for x in orig):
                    event_dict[key] = [x for xs in orig for x in xs]

        return event_dict

petemounce avatar Jan 28 '25 11:01 petemounce

@petemounce thanks for the report! Would you be interested in contributing a PR to add this filter in the library?

multani avatar Jan 28 '25 14:01 multani