msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Coerce a `None` value to the default

Open hoopes opened this issue 1 year ago • 7 comments

Question

Hi, I was hoping for a flag or some other method of interpreting a None value in json as the default value, such that it would continue to conform to the type of the field (and avoid mypy continuing to complain about my field being optional)

For example:

class ForceList(msgspec.Struct):
    data: list[int] = msgspec.field(default_factory=list)

x = msgspec.json.Decoder(ForceList).decode('{"data":null}')

results in

msgspec.ValidationError: Expected `array`, got `null` - at `$.data`

Which is true! But it would be cool to detect that null value, and force it to be an empty list. The alternative is to do something like:

class ForceList(msgspec.Struct):
    data: list[int] | None = msgspec.field(default_factory=list)

    def __post_init__(self):
        if self.data is None:
            self.data = []

But then my typing for the data field is an optional list, and mypy tells me that i need to sprinkle assertions everywhere to be sure that it's actually a list before i read it. It would also be great to then not serialize that back to json, but that would be a bonus.

Thanks so much for the library! You would not believe how much time it is saving me for very large json files (upwards of 200MB).

hoopes avatar May 01 '24 03:05 hoopes

One thing you could do now is to have a "private" field with optional list and a property that isn't optional.

mishamsk avatar May 04 '24 18:05 mishamsk

That's actually the direction i went, but the setters and append was all a little bit beyond the level of hackery I was willing to commit. If it was a read-only field, the private optional field and non-optional @property would work great. I was going to try to write some sort of helper class that would act like a list and support append, but eh. A bridge too far. Thanks for taking the time to reply though, I appreciate it!

hoopes avatar May 04 '24 21:05 hoopes

no prob. I am a big "immutables" fan, so wasn't thinking about the need for mutable fields.

another idea for you then - you can have phantom types, just for deserialization, which will have optional fields and post_init logic. Then you'll just convert from them to the final type to be used in the code. This can be easily automated, either via code generation (explicit approach), or you could even play around with generating the phantom types from the "true" types on the fly

on the topic, I think all the implicit logic, like treating nulls as missing value, is more headache then gain in the long run. pydantic was (or maybe still is) is so confusing with None/Optional handling. As PEP20 says - Explicit is better than implicit.

mishamsk avatar May 04 '24 23:05 mishamsk

I don't disagree - big fan of immutable data structures, and the explicit > implicit is true for sure. However - that's the json i got! msgspec is so much faster than pydantic that i gotta ask, at least.

hoopes avatar May 05 '24 03:05 hoopes

One thing you could do now is to have a "private" field with optional list and a property that isn't optional.

Something like this right?

from msgspec import Struct

class Example(Struct, frozen=True):
    raw: str | None = ""
    
    @property
    def final(self) -> int:
        return self.raw or ""

Yeah I guess users of this class would find it surprising to set raw, but use final

jack-mcivor avatar Sep 04 '24 19:09 jack-mcivor

another idea for you then - you can have phantom types, just for deserialization, which will have optional fields and post_init logic. Then you'll just convert from them to the final type to be used in the code.

@mishamsk do you mean like this? I might have misunderstood

from msgspec import Struct

class Example(Struct, frozen=True):
    x: str

class DecodableExample(Struct, frozen=True):
    x: str | None = ""
    
    def finalize(self) -> Example:
        return Example(x=self.x or "")

I think this is OK, but the ergonomics aren't great when this type is nested in a deep structure. I think you would end up copying every class above it, like this:

class Parent(Struct, frozen=True):
    ex: tuple[Example, ...]

class DecodableParent(Struct, frozen=True):
    ex: tuple[DecodableExample, ...]
    
    def finalize(self) -> Parent:
        return Parent(ex=tuple(e.finalize() for e in self.ex))

import msgspec
msgspec.json.decode(b'{"ex":[{"x":null}]}', type=DecodableParent).finalize()
# Parent(ex=(Example(x=''),))

I often want to coerce a None to the default value in decoding - I prefer not to use union types with None as I find it leads to a proliferation of None checking code.

jack-mcivor avatar Sep 04 '24 19:09 jack-mcivor

@jack-mcivor yes, that's what I roughly meant. Agree that it is not ideal, and for deeply nested models almost beats the purpose of msgspec...

mishamsk avatar Sep 05 '24 13:09 mishamsk