structlog-gcp icon indicating copy to clipboard operation
structlog-gcp copied to clipboard

httpRequest processor

Open petemounce opened this issue 1 year ago • 2 comments

We have a logging middleware that stamps each request with a canonical log-line. That attaches http request metadata as well as other things. That in turn causes GCP Logging to render those bits of metadata as lozenge-blobs in the UI, making them stand out a bit and also making the canonical log lines themselves stand out from the regular noise.

It was surprisingly fiddly to get this to work.

I'm not sure if you'd like to add some level of support? Here's something that resembles what I ended up doing:

class CanonicalLogLine(BaseHTTPMiddleware):
    def __init__(self, app: ASGIApp):
        super().__init__(app, dispatch=self.dispatch)
        self.logger = structlog.get_logger(__name__)

    async def dispatch(self, request: Request, call_next):
        start_time = time.perf_counter_ns()
        url = get_path_with_query_string(request.scope)
        client_host = request.client.host
        client_port = request.client.port
        http_method = request.method
        http_version = request.scope['http_version']
        response = await call_next(request)
        status_code = response.status_code
        request_duration = time.perf_counter_ns() - start_time
        self.logger.info(
            f"""{client_host}:{client_port} - "{http_method} {url} HTTP/{http_version}" {status_code}""",
            httpRequest={
                'latency': str(request_duration / 10**9) + 's',
                'protocol': f'HTTP/{http_version}',
                'remoteIp': f'{client_host}',
                'requestMethod': http_method,
                'requestUrl': str(request.url),
                'status': status_code,
                'userAgent': request.headers.get('user-agent', 'no-user-agent'),
            }
        )

        return response

... and then we have a structlog processor:

from structlog.types import EventDict, Processor, WrappedLogger


class AddGCPHTTP:
    def __init__(self, cloud_logging_key: str) -> None:
        self.cloud_logging_key = cloud_logging_key

    def setup(self) -> list[Processor]:
        return [self]

    def __call__(self, logger: WrappedLogger, method_name: str, event_dict: EventDict):
        if 'httpRequest' not in event_dict.keys():
            return event_dict

        event_dict[self.cloud_logging_key]['httpRequest'] = event_dict.pop('httpRequest')

        return event_dict

This was some time ago, and I don't remember the specifics but I do remember it being particularly papercutty. Perhaps if a field was not serialized how GCP expects (I vaguely recall things that are clearly numbers -- except status-code -- being necessary to pass as strings?) the whole thing didn't appear? Stuff like that.

petemounce avatar Jun 21 '24 12:06 petemounce

That would be a great addition to add, yes!

The canonical documentation for this would be:

  • https://cloud.google.com/logging/docs/structured-logging#structured_logging_special_fields for the httpRequest field
  • https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#httprequest for the fields it supports, and their types

This was some time ago, and I don't remember the specifics but I do remember it being particularly papercutty. Perhaps if a field was not serialized how GCP expects (I vaguely recall things that are clearly numbers -- except status-code -- being necessary to pass as strings?) the whole thing didn't appear?

I wouldn't be surprised that if the httpRequest fields types don't match those linked above, it won't show up correctly in Cloud Logging, yeah.

I'm totally in to add support for this. My only concern is that GCP specifies very specific fields to be logged, and as you discovered, they need to have the right types.

Maybe we could do it in 2 steps:

First, we add a processor like the AddGCPHTTP one you pasted above and say "please put exactly what is needed, how it's needed, in the httpRequest field".

Then, we can check how we could provide a better "user experience" to log this field correctly. I would need to think a bit more about it, but I would probably see a dedicated dataclass HTTPRequest:

  • containing all the fields documented in the doc, with their correct types | None
  • and providing a way to serialize this as a dict without the unset keys

For instance:


@dataclass
class HTTPRequest:
    method: str | None
    url: str | None
    size: int | None

    def format(self):
        """Format as https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#httprequest"""

        # TODO: filter out all None values
        return {
            "requestMethod": self.method,
            "requestUrl": self.url,
            "requestSize": str(self.size),
        }

In your middleware, you would do something like:

self.logger.info(
            f"""{client_host}:{client_port} - "{http_method} {url} HTTP/{http_version}" {status_code}""",
            httpRequest=HTTPRequest(method=http_method, url=url))

This way:

  • The library would offer a way to log the right fields that could be understood by GCP and would enforce them
  • The typing would provide hints as to what the fields really should be to be correctly understood by GCP
  • Maybe some niceties, like formatting the latency field correctly, etc.

What do you think?

multani avatar Jun 21 '24 19:06 multani

I've started to draft a solution in https://github.com/multani/structlog-gcp/pull/46

multani avatar Sep 21 '24 20:09 multani