msgspec
msgspec copied to clipboard
Support omitting Struct attributes from encoding/decoding
Since Struct types force users to declare all Struct attributes, there's currently no way to hold extra state on a Struct without having it also be encoded/decoded.
Two ways to handle this:
- Add
omit=TruetoMetaannotations. This could also be packaged up as a parametrized type, so users could write
class Point(Struct):
x: int
y: int
_private: Omit[some_type]
- Excluded attributes with a leading underscore by default? I'm not sure if we should support this one, it may be confusing behavior. As a reference point, golang's
jsondoesn't encode/decode private struct attributes by default.
Yes please! I would like to cache some (expensive) computed values in my objects, but not serialize them.
Thanks for the motivating use case (and apologies for the delayed response)! I hope to tackle this issue in the next few weeks before issuing a new release.
Since you commented here - as a user, would you want omitted attributes to also be excluded from __init__? Giving a concrete example:
class Point(Struct):
x: int
y: int
_private_1: Omit[int]
_private_2: Omit[int] = 0
Should __init__ include _private_1 and _private_2 arguments? Or should they be excluded from __init__, and only default values filled in? If they're excluded, that would mean that _private_1 would raise an AttributeError until it's explicitly set, and _private_2 would have an initial value of 0 after __init__.
Honestly I haven't been using the constructors much (decoding from a web request, encoding to a disk cache, decoding from the disk cache) so I don't have a strong opinion. I guess given that the object is valid without those values, the constructor shouldn't require them, but it would be nice if you could include them.
Also I would lean towards excluding attributes with leading underscores by default (without the Omit type), but I could see it being more annoying to have to include them if some API you're using has names like that.
Ok, we finally have the internals setup to make this fairly straightforward to implement.
First, I plan to call these "private" attributes, since that's effectively what they are. A private attribute won't be encoded or decoded, but will have space allocated on its respective struct.
There are some open questions though.
How should this be spelled?
There are two places that field configuration like this can go:
In the type annotation
class Example(Struct):
x: int
y: Private[int]
z: Private[float] = 0.0
An option in msgspec.field
class Example(Struct):
x: int
y: int = msgspec.field(private=True)
z: int = msgspec.field(default=0.0, private=True)
The former I find a bit easier to read, but it adds some complications:
- The type annotations would have to be parsed at struct definition time, same as they would for
ClassVar. This is doable, but to remain efficient can't handle all cases in the case of postponed annotations. - A
Privatetype annotation by its nature must be applied to a field - a typelist[Private[int]]is invalid, but python would happily let the user spell such a type. Using a default value withfieldties this annotation to struct fields alone, avoiding this issue. - It's much harder to get this approach to play well with
dataclass_transform, treatingPrivatefields differently from public ones.
Right now I'm leaning towards the msgspec.field approach. Many users will already be familiar with dataclasses.field, which includes support for omitting fields from the generated __init__ via dataclasses.field(init=False). This is similar enough to private=True that it shouldn't be that wild of a concept.
Should fields with leading underscores be private by default?
Since we're providing a way to set whether a field is public/private explicitly, users can always override any default behavior chosen here. Since fields with leading underscores are usually treated as private in python by convention, I'm tempted to do so here as well. So the following would be equivalent:
class Example1(Struct):
_field: int = 0 # private, since the leading underscore
class Example2(Struct):
_field: int = field(default=0, private=True) # explicitly private
On the other hand, I'd expect most users to never need to define private fields, so maybe inferring private/public status from the field name itself is overreaching.
Are private attributes excluded from __init__?
Right now I'm planning on excluding private attributes from basically everything in msgspec. They'll have space allocated in the __slots__ of the generated Struct type (and corresponding attributes defined), but will be excluded from:
__init____copy____reduce__(pickling)__repr____eq__and other comparison methods__hash____match_args__encodedecode
This makes them most useful for including extra state during runtime that is initialized after the Struct instance is already created. cached_property usage is related here (#204), since you'd need a slot allocated to store the cached value.
If you have a use case for including private attributes in __init__, please let comment below. If needed, worst case we can add another init parameter for including/excluding a field from __init__. private=True will imply init=False by default, but you can manually re-include it as field(private=True, init=True).
Should private fields require a default value/factory?
Right now all msgspec.Struct types require all fields be initialized as part of the generated __init__. The only way to get an AttributeError for a field on a struct is if you manually del the attribute off an existing struct. I think this makes sense. mypy/pyright type annotations are designed to handle types (or unions of types), but not the presence/absence of an attribute.
Keeping with this design (and assuming private fields are excluded from __init__), it may make sense to enforce that all private fields have a default value (since there's no other way a value could be set).
class Example(Struct):
private1: int | None = field(private=True, default=None) # this is ok, there's a default value
private2: int | None = field(private=True) # this is invalid, there's no default value
Summary
In summary, my current leanings on these questions are:
- Use
msgspec.fieldfor configuring if a field is private. - Make fields with a leading underscore private by default.
- Exclude private fields from
__init__. We can add a way to override this later if needed. - Require all private fields set a default value.
I don't plan on adding this before the next release (hopefully out next week), but after that whatever choices made here should (hopefully) be fairly straightforward to implement :crossed_fingers: .
This would be of great use to my current project --- like IcyMidnight, we've got a cache that would be really pointless to encode. The current plan is to clear the cache before encoding, which will work as long as we remember to do it.
- Use msgspec.field for configuring if a field is private.
- Make fields with a leading underscore private by default.
What happens if you do _something = field(private=False)? I think this should be allowed: a private field (in the Python convention sense) that's not private (in the msgspec sense).
Additionally, automatically treating fields with a leading underscore as private would break backwards compatibility. I found a few examples of Structs with _private fields which makes me think that making private fields an explicit, opt-in choice is a better idea.
Really looking forward to this feature, I hope you have time to work on it soon.
Adding a private=True parameter to field() seems fine to me.
Private omitted fields are the only way to have self-referential Structs.
e.g. a nested and occasionally recursive data structure with access to the _root node.
We can also use it to compute and store the path in that data structure because Structs aren't aware of their parent nodes.
Hi. Any update on this? I'm confused whether the library has this attribute.
Right now I'm leaning towards the msgspec.field approach. Many users will already be familiar with dataclasses.field, which includes support for omitting fields from the generated init via dataclasses.field(init=False). This is similar enough to private=True that it shouldn't be that wild of a concept.
If this is still up for debate, I concur that this is the right approach.
If you have a use case for including private attributes in init, please let comment below. If needed, worst case we can add another init parameter for including/excluding a field from init. private=True will imply init=False by default, but you can manually re-include it as field(private=True, init=True).
class Primitive(Struct):
iid: int = field(private=True) # internal ID, think auto-incrementing sequence meant to be private
eid: str = field(private=False) # external ID, some sort of public-exposed string identifier
In this use case, the object when fetched from the DB would have both IDs so the application layer has them available. But, when serialized as an HTTP response, we strip out the private iid.
Use msgspec.field for configuring if a field is private.
Agreed.
Make fields with a leading underscore private by default.
This is a nice API :sunglasses:
Exclude private fields from init. We can add a way to override this later if needed.
See my comment above. I am not sold on this one. The two paramater approach makes more sense to me as they are different use cases. I am fine if private=True implies init=False. But, I want to have the flexibility for private=True, init=True.
Require all private fields set a default value.
I guess I am okay with this, as long as I can still require it as an __init__. But once that is allowed, it seems to me this requirement falls off.
Would private fields also be excluded from __struct_fields__?
I came here looking for this functionality so I can store some extra metadata on a Struct that I don't want to be serialized or part of the __init__.
I'm envisaging an api where I can have an alternative constructer which can take optional (metadata) kwargs and store them on the struct without having them be included in the serialized representation - e.g.
class MyStruct(msgspec.Struct, tag=True):
metadata: dict = field(default={}, private=True)
@classmethod
def create(cls, **kwargs) -> MyStruct:
instance = MyStruct()
instance.metadata.update(**kwargs)
return instance
>>> obj = MyStruct.create(key='value')
>>> obj
MyStruct(metadata={'key': 'value'})
>>> msgspec.json.encode(obj)
b'{"type":"MyStruct"}'
>>> msgspec.json.decode(msgspec.json.encode(obj), type=MyStruct)
MyStruct(metadata={})
My actual use-case is a bit more complex but it boils down to wanting metadata associated with my objects but only for their lifetime in the current interpreter.
Another example, of something I'd like to do is to attach an fsspec.AbstractFileSystem to the class which would then be used in a save method to serialize itself and write to disk - e.g.
class MyStruct(msgspec.Struct, tag=True):
metadata: dict = field(default={})
fs: fsspec.AbstractFileSystem = field(private=True)
@classmethod
def create(cls, fs: fsspec.AbstractFileSystem, **kwargs) -> MyStruct:
instance = MyStruct()
instance.metadata.update(**kwargs)
instance.fs = fs
return instance
def save(self, filepath: str) -> None:
self.fs.write_bytes(
path=filepath,
value=msgspec.msgpack.encode(self),
)
msgspec doesn't know how to serialize an AbstractFileSystem, and I wouldn't want it to be serialized along with my actual data, however it would be convenient to be able to store it on the instances.
Any updates on this?