msgspec
msgspec copied to clipboard
Struct From a Subset of Another Struct
Description
When using msgspec
for an OpenAPI (JSON Schema) API, I observed something: The ability to create dynamic subsets/partials of other Struct
s for encoding/decoding purposes may be useful.
For example, a (contrived) Person
Struct:
class Person(msgspec.Struct):
# Primary Key
id: int
# Data
first_name: str
last_name: str
birthday: date
height: int
weight: int
# Bookkeeping
revision: int
entered: datetime
updated: datetime
Now for a traditional API I have the usual CRUD-y problems:
- Create: A
Person
won't have an ID yet - Read: A
Person
, but only part of it, to keeps things off the wire - Update: A
Person
with updates to a few attributes -- a "patch" -- and only return some bookkeeping info - Delete: A
Person
may return some bookkeeping info from the removed struct
What that ends up with, after all cases are accounted for is something like this:
# Original Person (usually for "Read")
class Person(msgspec.Struct):
# Primary Key
id: int
# Data
first_name: str
last_name: str
birthday: Optional[date] = None
height: Optional[int] = None
weight: Optional[int] = None
# Bookkeeping
revision: int
entered: datetime
updated: datetime
# Same as Person, but API consumer only needed `id`, `first_name` and `last_name` so the payload is
# reduced on the wire
class PersonEfficientRead(msgspec.Struct):
# Primary Key
id: int
# Data
first_name: str
last_name: str
# Same as Person, but no ID and no bookkeeping
class PersonCreate(msgspec.Struct):
# Data
first_name: str
last_name: str
birthday: Optional[date] = None
height: Optional[int] = None
weight: Optional[int] = None
# Same as Person, but no ID, no bookkeeping, and a lot of mandatory columns in the Create case are
# converted into optional types in the Update case -- perhaps only the `weight` is being updated, and
# the rest are ignored during validation. Example: `PATCH /person/1` `{"weight": 50}`
class PersonUpdate(msgspec.Struct):
# Data
first_name: str | UnsetType = UNSET
last_name: str | UnsetType = UNSET
birthday: Optional[date] | UnsetType = UNSET
height: Optional[int] | UnsetType = UNSET
weight: Optional[int] | UnsetType = UNSET
# Used for Update/Delete cases as a return value to notify the caller what the new bookkeeping info
# is (Update), or what has just been removed (Delete)
class PersonChange(msgspec.Struct):
# Primary Key
id: int
# Bookkeeping
revision: int
entered: datetime
updated: datetime
Add a large enough app and that's... quite a bit of duplication across many data types if one wants to eventually get all of this out of, or in to, JSON -- and ideally use JSON Schema (output by something like ReDoc)
Naively, this could all reduced to something msgspec.defstruct(...)
like:
msgspec.derivestruct(name: str, struct: Struct, changes: List, ...)
where changes
is a list of str
, 2-tuples, or 3-tuples in the style of msgspec.defstruct
, but with a field name as the first index in the tuples:
# Original Person
class Person(msgspec.Struct):
# Primary Key
id: int
# Data
first_name: str
last_name: str
birthday: date
height: int
weight: int
# Bookkeeping
revision: int
entered: datetime
updated: datetime
# Same as Person, but API consumer only needed `id`, `first_name` and `last_name` so the payload is
# reduced on the wire
PersonEfficientRead = msgspec.derivestruct('PersonEfficientRead', Person, ["id", "first_name", "last_name"])
# Same as Person, but creation does not require the ID
PersonCreate = msgspec.derivestruct(
'PersonCreate',
Person,
["first_name", "last_name", "birthday", "height", "weight"]
)
# Same as Person, but no ID, no bookkeeping, and a lot of mandatory columns in the Create case are
# converted into optional types in the Update case -- perhaps only the `weight` is being updated, and
# the rest are ignored during validation. Example: `PATCH /person/1` `{"weight": 50}`
PersonUpdate = msgspec.derivestruct(
'PersonUpdate',
Person,
[
("first_name", str | UnsetType, UNSET),
("last_name", str | UnsetType, UNSET),
("birthday", Optional[date] | UnsetType, UNSET),
("height", Optional[int] | UnsetType, UNSET),
("weight", Optional[int] | UnsetType, UNSET)
]
)
# Same as Person, but only bookkeeping info
PersonChange = msgspec.derivestruct('PersonChange', Person, ["id", "revision", "entered", "updated"])
# Demo of all changes types
PersonCustom = msgspec.derivestruct(
'PersonCustom',
Person,
[
"first_name", # str -- No type change, just field name
("birthday", Optional[date] | UnsetType), # 2-tuple -- change type, not default
("height", Optional[int] | UnsetType, UNSET), # 3-tuple -- change type and default\
]
)
So questions:
- Is this approach just bonkers, in general, and I've missed something fundamental on how to approach this problem for an API?
- If the approach to the problem makes sense -- and the all the similar-ish
Struct
s are needed -- is there a feature here wheremsgspec
can reduce some of the boilerplate and potential type-mismatch errors by adding a function that can dynamically create a Struct based on another Struct? - Or is it that the approach targets the wrong layer, and something in the encode/decode can better handle this problem?
Thanks in advance for any consideration/guidance!
I'm also looking for some solution when I want to update an record in the database.
Now I can create a RecordModify struct,with all fields optional, to solve this problem, but as you said, it will be a lot of mess when the App become larger.
I've implemented a proof of concept here: https://gist.github.com/dmckeone/d1fbc9910aee302804b444d6d20a8df9
I also added a flourish that allows a placeholder SourceType
to simplify the Patch case:
PersonUpdate = msgspec.derivestruct(
"PersonUpdate",
Person,
[
("first_name", SourceType | msgspec.UnsetType, msgspec.UNSET),
("last_name", SourceType | msgspec.UnsetType, msgspec.UNSET),
("birthday", SourceType | msgspec.UnsetType, msgspec.UNSET),
("height", SourceType | msgspec.UnsetType, msgspec.UNSET),
("weight", SourceType | msgspec.UnsetType, msgspec.UNSET),
],
)
It's fairly fragile because it basically only works with unions, and not deeply nested. It sure is handy for patch though.
Litestar's DTOs (tutorial, docs) are doing something akin to this.
We take some annotated type that is used for domain modelling (e.g., dataclass, sqlalchemy model, struct, pydantic model etc) and generate a msgspec struct to represent transfer of data for that type for a route handler, or set of route handlers based on some configuration. We then use that generated struct to validate the data from the wire before injecting it into a handler, and to serialize data to return over the wire.
This is the config object:
@dataclass(frozen=True)
class DTOConfig:
"""Control the generated DTO."""
exclude: AbstractSet[str] = field(default_factory=set)
"""Explicitly exclude fields from the generated DTO.
If exclude is specified, all fields not specified in exclude will be included by default.
Notes:
- The field names are dot-separated paths to nested fields, e.g. ``"address.street"`` will
exclude the ``"street"`` field from a nested ``"address"`` model.
- 'exclude' mutually exclusive with 'include' - specifying both values will raise an
``ImproperlyConfiguredException``.
"""
include: AbstractSet[str] = field(default_factory=set)
"""Explicitly include fields in the generated DTO.
If include is specified, all fields not specified in include will be excluded by default.
Notes:
- The field names are dot-separated paths to nested fields, e.g. ``"address.street"`` will
include the ``"street"`` field from a nested ``"address"`` model.
- 'include' mutually exclusive with 'exclude' - specifying both values will raise an
``ImproperlyConfiguredException``.
"""
rename_fields: dict[str, str] = field(default_factory=dict)
"""Mapping of field names, to new name."""
rename_strategy: RenameStrategy | None = None
"""Rename all fields using a pre-defined strategy or a custom strategy.
The pre-defined strategies are: `upper`, `lower`, `camel`, `pascal`.
A custom strategy is any callable that accepts a string as an argument and
return a string.
Fields defined in ``rename_fields`` are ignored."""
max_nested_depth: int = 1
"""The maximum depth of nested items allowed for data transfer."""
partial: bool = False
"""Allow transfer of partial data."""
underscore_fields_private: bool = True
"""Fields starting with an underscore are considered private and excluded from data transfer."""
def __post_init__(self) -> None:
if self.include and self.exclude:
raise ImproperlyConfiguredException(
"'include' and 'exclude' are mutually exclusive options, please use one of them"
)
The DTOs are tightly bound to the Litestar project at the moment, but we are working on splitting out certain functionalities into jolt-org for things like this, that might be more generally useful beyond litestar, and I believe DTOs are one of those things.
Hi! Maybe? :)
class Item(Struct):
id: int
x: str
y: str
@view(exclude={"id"})
class Create(Struct):
x: Annotated[str, Meta(min_length=1)] # override type for additional validation
@view
class Update(Struct):
x: Annotated[str, Meta(min_length=1)] | UnsetType = UNSET
y: str | UnsetType = UNSET
def get(id) -> Item:
return db.get_by_id(id)
def create(item: Item.Create) -> Item:
return db.create(item) # Id is autogenerated
def update(item: Item.Update) -> Item:
db_item = db.get_by_id(item.id)
for k, v in asdict(item).items():
if v != UNSET:
setattr(db_item, k, v)
return db.save(db_item)
item = Item(id=1, x='x', y='y')
assert asdict(item.Create()) == {'x': 'x', 'y': 'y'}
assert issubclass(Item.Create, Item) # not really subclass, only __subclasscheck__ modification
assert isinstance(item.Create(), Item) # not really subinstance, only __instancecheck__ modification
assert Item.Create.__class__.__name__ == 'ItemCreate'
And nested(recursive)
class A(Struct):
b: ForwardRef('B')
@view
class View(Struct):
pass
class B(Struct):
x: int
y: int
@view(exclude={'y'})
class View(Struct):
pass
# A and B both have view `View`
assert issubclass(A.View.__annotations__['b'], B.View)
a = A(b=B(x=0, y=1))
assert asdict(a.View()) == {'b': {'x': 0}}
Just curious, so I thought I'd check in again. Is the derive_struct
here interesting to be part of the main library, or is that out of scope for what you want to do? https://gist.github.com/dmckeone/d1fbc9910aee302804b444d6d20a8df9