gcp-ingestion
gcp-ingestion copied to clipboard
Consider some message scrubbing before parsing JSON payloads
Currently messages are scrubbed after the payload is parsed. While looking into mozdata.monitoring.payload_bytes_error_structured
I noticed JSON parse exceptions for some pings that are ignored.
Since some parts of the scrubbing process do not require message to be parsed (e.g. we have document namespace and type available beforehand), we could split and run them before attempting to parse to avoid doing some unnecessary work.