msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Struct From a Subset of Another Struct

Open dmckeone opened this issue 1 year ago • 5 comments

Description

When using msgspec for an OpenAPI (JSON Schema) API, I observed something: The ability to create dynamic subsets/partials of other Structs for encoding/decoding purposes may be useful.

For example, a (contrived) Person Struct:

class Person(msgspec.Struct):
    # Primary Key
    id: int

    # Data
    first_name: str
    last_name: str
    birthday: date
    height: int
    weight: int
    
    # Bookkeeping
    revision: int
    entered: datetime
    updated: datetime

Now for a traditional API I have the usual CRUD-y problems:

  • Create: A Person won't have an ID yet
  • Read: A Person, but only part of it, to keeps things off the wire
  • Update: A Person with updates to a few attributes -- a "patch" -- and only return some bookkeeping info
  • Delete: A Person may return some bookkeeping info from the removed struct

What that ends up with, after all cases are accounted for is something like this:

# Original Person (usually for "Read")
class Person(msgspec.Struct):
    # Primary Key
    id: int

    # Data
    first_name: str
    last_name: str
    birthday: Optional[date] = None
    height: Optional[int] = None
    weight: Optional[int] = None
    
    # Bookkeeping
    revision: int
    entered: datetime
    updated: datetime


# Same as Person, but API consumer only needed `id`, `first_name` and `last_name` so the payload is 
# reduced on the wire
class PersonEfficientRead(msgspec.Struct):
    # Primary Key
    id: int

    # Data
    first_name: str
    last_name: str


# Same as Person, but no ID and no bookkeeping
class PersonCreate(msgspec.Struct):
    # Data
    first_name: str
    last_name: str
    birthday: Optional[date] = None
    height: Optional[int] = None
    weight: Optional[int] = None


# Same as Person, but no ID, no bookkeeping, and a lot of mandatory columns in the Create case are 
# converted into optional types in the Update case -- perhaps only the `weight` is being updated, and 
# the rest are ignored during validation.  Example: `PATCH /person/1` `{"weight": 50}`
class PersonUpdate(msgspec.Struct):
    # Data
    first_name: str | UnsetType = UNSET
    last_name: str | UnsetType = UNSET
    birthday: Optional[date] | UnsetType = UNSET
    height: Optional[int] | UnsetType = UNSET
    weight: Optional[int] | UnsetType = UNSET


# Used for Update/Delete cases as a return value to notify the caller what the new bookkeeping info 
# is (Update), or what has just been removed (Delete)
class PersonChange(msgspec.Struct):
    # Primary Key
    id: int

    # Bookkeeping
    revision: int
    entered: datetime
    updated: datetime

Add a large enough app and that's... quite a bit of duplication across many data types if one wants to eventually get all of this out of, or in to, JSON -- and ideally use JSON Schema (output by something like ReDoc)

Naively, this could all reduced to something msgspec.defstruct(...) like:

msgspec.derivestruct(name: str, struct: Struct, changes: List, ...)

where changes is a list of str, 2-tuples, or 3-tuples in the style of msgspec.defstruct, but with a field name as the first index in the tuples:

# Original Person 
class Person(msgspec.Struct):
    # Primary Key
    id: int

    # Data
    first_name: str
    last_name: str
    birthday: date
    height: int
    weight: int
    
    # Bookkeeping
    revision: int
    entered: datetime
    updated: datetime


# Same as Person, but API consumer only needed `id`, `first_name` and `last_name` so the payload is 
# reduced on the wire
PersonEfficientRead = msgspec.derivestruct('PersonEfficientRead', Person, ["id", "first_name", "last_name"])

# Same as Person, but creation does not require the ID
PersonCreate = msgspec.derivestruct(
    'PersonCreate', 
    Person, 
    ["first_name", "last_name", "birthday", "height", "weight"]
)

# Same as Person, but no ID, no bookkeeping, and a lot of mandatory columns in the Create case are 
# converted into optional types in the Update case -- perhaps only the `weight` is being updated, and 
# the rest are ignored during validation.  Example: `PATCH /person/1` `{"weight": 50}`
PersonUpdate = msgspec.derivestruct(
    'PersonUpdate', 
    Person, 
    [
        ("first_name", str | UnsetType, UNSET),        
        ("last_name", str | UnsetType, UNSET),
        ("birthday", Optional[date] | UnsetType, UNSET),  
        ("height", Optional[int] | UnsetType, UNSET),  
        ("weight", Optional[int] | UnsetType, UNSET)
    ]
)

# Same as Person, but only bookkeeping info
PersonChange = msgspec.derivestruct('PersonChange', Person, ["id", "revision", "entered", "updated"])


# Demo of all changes types
PersonCustom = msgspec.derivestruct(
    'PersonCustom', 
    Person, 
    [
        "first_name",  # str -- No type change, just field name
        ("birthday", Optional[date] | UnsetType),  # 2-tuple -- change type, not default
        ("height", Optional[int] | UnsetType, UNSET),  # 3-tuple -- change type and default\
    ]
)

So questions:

  1. Is this approach just bonkers, in general, and I've missed something fundamental on how to approach this problem for an API?
  2. If the approach to the problem makes sense -- and the all the similar-ish Structs are needed -- is there a feature here where msgspec can reduce some of the boilerplate and potential type-mismatch errors by adding a function that can dynamically create a Struct based on another Struct?
  3. Or is it that the approach targets the wrong layer, and something in the encode/decode can better handle this problem?

Thanks in advance for any consideration/guidance!

dmckeone avatar Sep 18 '23 20:09 dmckeone

I'm also looking for some solution when I want to update an record in the database.

Now I can create a RecordModify struct,with all fields optional, to solve this problem, but as you said, it will be a lot of mess when the App become larger.

FHU-yezi avatar Sep 18 '23 23:09 FHU-yezi

I've implemented a proof of concept here: https://gist.github.com/dmckeone/d1fbc9910aee302804b444d6d20a8df9

I also added a flourish that allows a placeholder SourceType to simplify the Patch case:

PersonUpdate = msgspec.derivestruct(
        "PersonUpdate",
        Person,
        [
            ("first_name", SourceType | msgspec.UnsetType, msgspec.UNSET),
            ("last_name", SourceType | msgspec.UnsetType, msgspec.UNSET),
            ("birthday", SourceType | msgspec.UnsetType, msgspec.UNSET),
            ("height", SourceType | msgspec.UnsetType, msgspec.UNSET),
            ("weight", SourceType | msgspec.UnsetType, msgspec.UNSET),
        ],
    )

It's fairly fragile because it basically only works with unions, and not deeply nested. It sure is handy for patch though.

dmckeone avatar Sep 19 '23 20:09 dmckeone

Litestar's DTOs (tutorial, docs) are doing something akin to this.

We take some annotated type that is used for domain modelling (e.g., dataclass, sqlalchemy model, struct, pydantic model etc) and generate a msgspec struct to represent transfer of data for that type for a route handler, or set of route handlers based on some configuration. We then use that generated struct to validate the data from the wire before injecting it into a handler, and to serialize data to return over the wire.

This is the config object:

@dataclass(frozen=True)
class DTOConfig:
    """Control the generated DTO."""

    exclude: AbstractSet[str] = field(default_factory=set)
    """Explicitly exclude fields from the generated DTO.

    If exclude is specified, all fields not specified in exclude will be included by default.

    Notes:
        - The field names are dot-separated paths to nested fields, e.g. ``"address.street"`` will
            exclude the ``"street"`` field from a nested ``"address"`` model.
        - 'exclude' mutually exclusive with 'include' - specifying both values will raise an
            ``ImproperlyConfiguredException``.
    """
    include: AbstractSet[str] = field(default_factory=set)
    """Explicitly include fields in the generated DTO.

    If include is specified, all fields not specified in include will be excluded by default.

    Notes:
        - The field names are dot-separated paths to nested fields, e.g. ``"address.street"`` will
            include the ``"street"`` field from a nested ``"address"`` model.
        - 'include' mutually exclusive with 'exclude' - specifying both values will raise an
            ``ImproperlyConfiguredException``.
    """
    rename_fields: dict[str, str] = field(default_factory=dict)
    """Mapping of field names, to new name."""
    rename_strategy: RenameStrategy | None = None
    """Rename all fields using a pre-defined strategy or a custom strategy.

    The pre-defined strategies are: `upper`, `lower`, `camel`, `pascal`.

    A custom strategy is any callable that accepts a string as an argument and
    return a string.

    Fields defined in ``rename_fields`` are ignored."""
    max_nested_depth: int = 1
    """The maximum depth of nested items allowed for data transfer."""
    partial: bool = False
    """Allow transfer of partial data."""
    underscore_fields_private: bool = True
    """Fields starting with an underscore are considered private and excluded from data transfer."""

    def __post_init__(self) -> None:
        if self.include and self.exclude:
            raise ImproperlyConfiguredException(
                "'include' and 'exclude' are mutually exclusive options, please use one of them"
            )

The DTOs are tightly bound to the Litestar project at the moment, but we are working on splitting out certain functionalities into jolt-org for things like this, that might be more generally useful beyond litestar, and I believe DTOs are one of those things.

peterschutt avatar Sep 20 '23 07:09 peterschutt

Hi! Maybe? :)

class Item(Struct):
    id: int
    x: str
    y: str
    
    @view(exclude={"id"})
    class Create(Struct):
        x: Annotated[str, Meta(min_length=1)]  # override type for additional validation

    @view
    class Update(Struct):
        x: Annotated[str, Meta(min_length=1)] | UnsetType = UNSET
        y: str | UnsetType = UNSET

def get(id) -> Item:
    return db.get_by_id(id)

def create(item: Item.Create) -> Item:
    return db.create(item)  # Id is autogenerated

def update(item: Item.Update) -> Item:
    db_item = db.get_by_id(item.id)
    for k, v in asdict(item).items():
        if v != UNSET:
            setattr(db_item, k, v)
    return db.save(db_item)

item = Item(id=1, x='x', y='y')
assert asdict(item.Create()) == {'x': 'x', 'y': 'y'}

assert issubclass(Item.Create, Item)  # not really subclass, only __subclasscheck__ modification
assert isinstance(item.Create(), Item)  # not really subinstance, only __instancecheck__ modification
assert Item.Create.__class__.__name__ == 'ItemCreate'

And nested(recursive)

class A(Struct):
    b: ForwardRef('B')

    @view
    class View(Struct):
        pass

class B(Struct):
    x: int
    y: int

    @view(exclude={'y'})
    class View(Struct):
        pass

# A and B both have view `View`

assert  issubclass(A.View.__annotations__['b'], B.View)
a = A(b=B(x=0, y=1))
assert asdict(a.View()) == {'b': {'x': 0}}

levsh avatar Sep 27 '23 14:09 levsh

Just curious, so I thought I'd check in again. Is the derive_struct here interesting to be part of the main library, or is that out of scope for what you want to do? https://gist.github.com/dmckeone/d1fbc9910aee302804b444d6d20a8df9

dmckeone avatar Mar 09 '24 00:03 dmckeone