msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Unexpected error in `convert` for unset values

Open tijmenr opened this issue 1 year ago • 1 comments
trafficstars

Description

The convert function does not handle UNSET values as I expected (or I am missing some detail in how optionality and unset work together). If a field has a union type that allows msgspec.UnsetType, it seems to ignore that during the conversion:

>>> import msgspec
>>> msgspec.__version__
'0.18.6'

>>> class A(msgspec.Struct,):
...     x: str
...     y: str | msgspec.UnsetType = msgspec.field(default=msgspec.UNSET)
...
>>> class B(msgspec.Struct):    # Basically same as A, but might differ in encoded field names
...     x: str
...     y: str | msgspec.UnsetType = msgspec.field(default=msgspec.UNSET)
...
>>> a = A(x='x')   # a=A(x='x', y=UNSET)

The value a fits the struct B (as it allows an unset y field), so I would expect conversion of a to an instance of B to succeed, with the value of the y field set to either UNSET (as it is in a) or the defined default in B (which in this case happens to be UNSET as well; this would need documentation). However, it fails on this unset value of the y field`:

>>> msgspec.convert(a, B, from_attributes=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Expected `str`, got `msgspec.UnsetType` - at `$.y`

(Trying to "convert" a into a new instance of struct A does not yield an error, probably because there is some shortcut logic at play).

The same error also occurs when trying to convert a dict with an explicit unset value:

>>> msgspec.convert({'x': 'x', 'y': msgspec.UNSET}, B, from_attributes=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Expected `str`, got `msgspec.UnsetType` - at `$.y`

I currently have a use case where I basically need to translate the field names of a multi-level json structure from kebab-case to PascalCase, and my idea was to create two roughly the same structs K and P (using the handy rename option so that each gets the desired encoded field names), decode the source document using struct K, convert it into struct P, and use that last one for output. However, this issue with unset fields hinders that approach. (I'm going to try and change the default value from UNSET to None as a workaround, but a side effect is not being able to distinguish between a field just not being present, or being present with a null value in the source document.)

On a side note (not sure if that would be a separate bug or a feature), having a dict as an intermediate step between K and P does not help, because msgspec.structs.asdict does not honor the omit_defaults option of the struct, so it does include the unset fields in the dict, and this means convert can not work on the dict either (unless I first rebuild the dict myself to remove all unset fields).

tijmenr avatar Oct 20 '24 23:10 tijmenr