msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Is there an easier way to pre-process incoming values for specific fields?

Open amogus07 opened this issue 10 months ago • 4 comments
trafficstars

Question

I want to convert an incoming comma-separated string to a set of enum members. This is my current solution:

# to avoid extra whitespace
SPLIT_PATTERN: re.Pattern[str] = re.compile(r"\s*,\s*")


class AlbumArtist(TaggedBase, kw_only=True):
    _categories: str = msgspec.field(name="categories")
    _effective_roles: str = msgspec.field(name="effectiveRoles")
    is_support: bool
    name: str
    roles: str
    artist: Optional[Artist] = None
    categories: set[ArtistCategories] = msgspec.field(
        default_factory=set, name="dummy1"
    )
    effective_roles: set[ArtistRoles] = msgspec.field(
        default_factory=set, name="dummy2"
    )

    def __post_init__(self) -> None:
        self.categories = {
            ArtistCategories(c) for c in SPLIT_PATTERN.split(self._categories) if c
        }
        self.effective_roles = {
            ArtistRoles(r) for r in SPLIT_PATTERN.split(self._effective_roles) if r
        }

it's a snippet from https://github.com/prTopi/beets-vocadb/blob/2a2b3cca83449b26717ffff2a7bb085b26381d26/beetsplug/vocadb/requests_handler/models.py

Is there a more efficient way that doesn't involve additional attributes?

amogus07 avatar Jan 08 '25 22:01 amogus07

ok, came up with this in the meantime:

E = TypeVar("E", bound=StrEnum)


class AlbumArtist(TaggedBase, dict=True, kw_only=True):
    _categories: str = msgspec.field(name="categories")
    _effective_roles: str = msgspec.field(name="effectiveRoles")
    is_support: bool
    name: str
    roles: str
    artist: Optional[Artist] = None

    _SPLIT_PATTERN: ClassVar[re.Pattern[str]] = re.compile(r"\s*,\s*")

    @cached_property
    def categories(self) -> set[ArtistCategories]:
        return self._parse_enum_set(self._categories, ArtistCategories)

    @cached_property
    def effective_roles(self) -> set[ArtistRoles]:
        return self._parse_enum_set(self._effective_roles, ArtistRoles)

    @classmethod
    def _parse_enum_set(cls, value: str, enum_class: type[E]) -> set[E]:
        """Helper method to parse comma-separated string into set of enum values"""
        return {
            enum_class(item) for item in cls._SPLIT_PATTERN.split(value) if item
        }

Now, I have another problem: I use httpx in my project for api requests, and somehow need to pass specific keys and values to the params parameter of httpx.get. Currently, each params Struct has its own asdict property that puts all its attributes into a dict suitable for httpx, doing basically the opposite of the above: https://github.com/prTopi/beets-vocadb/blob/3e9033c1354a1d003498c6100d39a59a105018ac/beetsplug/vocadb/requests_handler/init.py But that doesn't seem like a good solution to me. Does anyone know a better way of doing this?

amogus07 avatar Jan 11 '25 22:01 amogus07

@amogus07, yes there are some ways, but I'm sure you won't escape implementing custom logic in any of them. You can explore: https://jcristharif.com/msgspec/extending.html

here's a way to achieve your goal different to your solution:

from typing import Any
import msgspec

# data object

class CsvSet:
    def __init__(self, raw_value: str):
        self._values = set(raw_value.split(','))
    def __eq__[T: set](self, other: T) -> bool:
        return self._values == other
    def __str__(self):
        return ','.join(self._values)

class MyStruct(msgspec.Struct):
    param: CsvSet

# custom hooks

def enc_hook(obj: Any) -> Any:
    if isinstance(obj, CsvSet):
        return str(obj)

def dec_hook(tp: type, obj: Any) -> Any:
    if issubclass(tp, CsvSet):
        return CsvSet(obj)

# tests

data = msgspec.convert({"param": "foo,bar"}, MyStruct, dec_hook=dec_hook)
assert data.param == {'foo', 'bar'}

serialized = msgspec.to_builtins(data, enc_hook=enc_hook)
assert serialized['param'] in ('foo,bar', 'bar,foo')
  • New type with your custom logic for handling CSV
  • Custom encoding and decoding hooks

Note that you can't (easily) inherit from set, see: Supported Typed

p.s.: adding generics support should not be too hard. at this moment, you can bring GenericAlias into your field and throw your validation errors from dec_hook.

uwinx avatar Jan 17 '25 17:01 uwinx

I should've probably mentioned that Python 3.9 needs to be supported, which doesn't support function type parameter syntax. Also, is it possible to get this to work without using Any?

amogus07 avatar Jan 20 '25 17:01 amogus07

@amogus07, I think you can get around new type param syntax by introducing old type-vars bound to set.

for T in fn[T: set](): ...T = TypeVar('T', bound=set)

However, I'm not sure if you can avoid using Any in dec/enc hooks. Check stubs for what's expected: https://github.com/jcrist/msgspec/blob/dd965dce22e5278d4935bea923441ecde31b5325/msgspec/init.pyi#L154

uwinx avatar Jan 28 '25 07:01 uwinx