structlog-gcp
structlog-gcp copied to clipboard
httpRequest processor
We have a logging middleware that stamps each request with a canonical log-line. That attaches http request metadata as well as other things. That in turn causes GCP Logging to render those bits of metadata as lozenge-blobs in the UI, making them stand out a bit and also making the canonical log lines themselves stand out from the regular noise.
It was surprisingly fiddly to get this to work.
I'm not sure if you'd like to add some level of support? Here's something that resembles what I ended up doing:
class CanonicalLogLine(BaseHTTPMiddleware):
def __init__(self, app: ASGIApp):
super().__init__(app, dispatch=self.dispatch)
self.logger = structlog.get_logger(__name__)
async def dispatch(self, request: Request, call_next):
start_time = time.perf_counter_ns()
url = get_path_with_query_string(request.scope)
client_host = request.client.host
client_port = request.client.port
http_method = request.method
http_version = request.scope['http_version']
response = await call_next(request)
status_code = response.status_code
request_duration = time.perf_counter_ns() - start_time
self.logger.info(
f"""{client_host}:{client_port} - "{http_method} {url} HTTP/{http_version}" {status_code}""",
httpRequest={
'latency': str(request_duration / 10**9) + 's',
'protocol': f'HTTP/{http_version}',
'remoteIp': f'{client_host}',
'requestMethod': http_method,
'requestUrl': str(request.url),
'status': status_code,
'userAgent': request.headers.get('user-agent', 'no-user-agent'),
}
)
return response
... and then we have a structlog processor:
from structlog.types import EventDict, Processor, WrappedLogger
class AddGCPHTTP:
def __init__(self, cloud_logging_key: str) -> None:
self.cloud_logging_key = cloud_logging_key
def setup(self) -> list[Processor]:
return [self]
def __call__(self, logger: WrappedLogger, method_name: str, event_dict: EventDict):
if 'httpRequest' not in event_dict.keys():
return event_dict
event_dict[self.cloud_logging_key]['httpRequest'] = event_dict.pop('httpRequest')
return event_dict
This was some time ago, and I don't remember the specifics but I do remember it being particularly papercutty. Perhaps if a field was not serialized how GCP expects (I vaguely recall things that are clearly numbers -- except status-code -- being necessary to pass as strings?) the whole thing didn't appear? Stuff like that.
That would be a great addition to add, yes!
The canonical documentation for this would be:
- https://cloud.google.com/logging/docs/structured-logging#structured_logging_special_fields for the
httpRequestfield - https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#httprequest for the fields it supports, and their types
This was some time ago, and I don't remember the specifics but I do remember it being particularly papercutty. Perhaps if a field was not serialized how GCP expects (I vaguely recall things that are clearly numbers -- except status-code -- being necessary to pass as strings?) the whole thing didn't appear?
I wouldn't be surprised that if the httpRequest fields types don't match those linked above, it won't show up correctly in Cloud Logging, yeah.
I'm totally in to add support for this. My only concern is that GCP specifies very specific fields to be logged, and as you discovered, they need to have the right types.
Maybe we could do it in 2 steps:
First, we add a processor like the AddGCPHTTP one you pasted above and say "please put exactly what is needed, how it's needed, in the httpRequest field".
Then, we can check how we could provide a better "user experience" to log this field correctly. I would need to think a bit more about it, but I would probably see a dedicated dataclass HTTPRequest:
- containing all the fields documented in the doc, with their correct types
| None - and providing a way to serialize this as a dict without the unset keys
For instance:
@dataclass
class HTTPRequest:
method: str | None
url: str | None
size: int | None
def format(self):
"""Format as https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#httprequest"""
# TODO: filter out all None values
return {
"requestMethod": self.method,
"requestUrl": self.url,
"requestSize": str(self.size),
}
In your middleware, you would do something like:
self.logger.info(
f"""{client_host}:{client_port} - "{http_method} {url} HTTP/{http_version}" {status_code}""",
httpRequest=HTTPRequest(method=http_method, url=url))
This way:
- The library would offer a way to log the right fields that could be understood by GCP and would enforce them
- The typing would provide hints as to what the fields really should be to be correctly understood by GCP
- Maybe some niceties, like formatting the
latencyfield correctly, etc.
What do you think?
I've started to draft a solution in https://github.com/multani/structlog-gcp/pull/46