msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Wrap `UnicodeDecodeError` in `msgspec.DecodeError`?

Open clumsy9 opened this issue 1 year ago • 0 comments

Question

We maintain an application called logprep that allows to collect, process and forward log messages in JSON format from various data sources.

Serialization and deserialization of JSON documents is performed using Encoder and Decoder of msgspec.json. Once a raw message has been fetched and returned by _get_raw_event, we try to decode it using msgspec.json.Decoder (the _decoder attribute in following code snippet):

raw_event = self._get_raw_event(timeout)
if raw_event is None:
    return None, None
try:
    event_dict = self._decoder.decode(raw_event)
except msgspec.DecodeError as error:
    raise CriticalInputParsingError(
        self, "Input record value is not a valid json string", raw_event
     ) from error
return event_dict, raw_event

Unfortunately, we have noticed that msgspec.DecodeErorr does not wrap UnicodeDecodeError exceptions. So, for example, trying to decode the following raw_event that contains latin-1 encoded characters breaks the above example:

b'{"@timestamp":"2024-07-22T12:58:21+02:00", "fromhost-ip":"192.168.178.2", "hostname":"fancy_host", "message":"Driver HP Universal Printing PCL 6 (v7.0.1) required for printer fancy_printer (Color K\xfcche) is unknown. Contact the administrator to install the driver before you log in again."'

As far as I can see, the documentation does not specify which types of decoding errors are wrapped/chained by DecodeError. I am aware that not all kinds of decoding errors can or need to be wrapped, but I am not quite sure if this is actually the expected behaviour. If so: Is there any documentation on what decoding errors are actually wrapped/chained by DecodeError?

Thank your for your replies!

clumsy9 avatar Aug 02 '24 10:08 clumsy9