msgspec
msgspec copied to clipboard
Is there an easier way to pre-process incoming values for specific fields?
Question
I want to convert an incoming comma-separated string to a set of enum members. This is my current solution:
# to avoid extra whitespace
SPLIT_PATTERN: re.Pattern[str] = re.compile(r"\s*,\s*")
class AlbumArtist(TaggedBase, kw_only=True):
_categories: str = msgspec.field(name="categories")
_effective_roles: str = msgspec.field(name="effectiveRoles")
is_support: bool
name: str
roles: str
artist: Optional[Artist] = None
categories: set[ArtistCategories] = msgspec.field(
default_factory=set, name="dummy1"
)
effective_roles: set[ArtistRoles] = msgspec.field(
default_factory=set, name="dummy2"
)
def __post_init__(self) -> None:
self.categories = {
ArtistCategories(c) for c in SPLIT_PATTERN.split(self._categories) if c
}
self.effective_roles = {
ArtistRoles(r) for r in SPLIT_PATTERN.split(self._effective_roles) if r
}
it's a snippet from https://github.com/prTopi/beets-vocadb/blob/2a2b3cca83449b26717ffff2a7bb085b26381d26/beetsplug/vocadb/requests_handler/models.py
Is there a more efficient way that doesn't involve additional attributes?
ok, came up with this in the meantime:
E = TypeVar("E", bound=StrEnum)
class AlbumArtist(TaggedBase, dict=True, kw_only=True):
_categories: str = msgspec.field(name="categories")
_effective_roles: str = msgspec.field(name="effectiveRoles")
is_support: bool
name: str
roles: str
artist: Optional[Artist] = None
_SPLIT_PATTERN: ClassVar[re.Pattern[str]] = re.compile(r"\s*,\s*")
@cached_property
def categories(self) -> set[ArtistCategories]:
return self._parse_enum_set(self._categories, ArtistCategories)
@cached_property
def effective_roles(self) -> set[ArtistRoles]:
return self._parse_enum_set(self._effective_roles, ArtistRoles)
@classmethod
def _parse_enum_set(cls, value: str, enum_class: type[E]) -> set[E]:
"""Helper method to parse comma-separated string into set of enum values"""
return {
enum_class(item) for item in cls._SPLIT_PATTERN.split(value) if item
}
Now, I have another problem: I use httpx in my project for api requests, and somehow need to pass specific keys and values to the params parameter of httpx.get. Currently, each params Struct has its own asdict property that puts all its attributes into a dict suitable for httpx, doing basically the opposite of the above: https://github.com/prTopi/beets-vocadb/blob/3e9033c1354a1d003498c6100d39a59a105018ac/beetsplug/vocadb/requests_handler/init.py
But that doesn't seem like a good solution to me. Does anyone know a better way of doing this?
@amogus07, yes there are some ways, but I'm sure you won't escape implementing custom logic in any of them. You can explore: https://jcristharif.com/msgspec/extending.html
here's a way to achieve your goal different to your solution:
from typing import Any
import msgspec
# data object
class CsvSet:
def __init__(self, raw_value: str):
self._values = set(raw_value.split(','))
def __eq__[T: set](self, other: T) -> bool:
return self._values == other
def __str__(self):
return ','.join(self._values)
class MyStruct(msgspec.Struct):
param: CsvSet
# custom hooks
def enc_hook(obj: Any) -> Any:
if isinstance(obj, CsvSet):
return str(obj)
def dec_hook(tp: type, obj: Any) -> Any:
if issubclass(tp, CsvSet):
return CsvSet(obj)
# tests
data = msgspec.convert({"param": "foo,bar"}, MyStruct, dec_hook=dec_hook)
assert data.param == {'foo', 'bar'}
serialized = msgspec.to_builtins(data, enc_hook=enc_hook)
assert serialized['param'] in ('foo,bar', 'bar,foo')
- New type with your custom logic for handling CSV
- Custom encoding and decoding hooks
Note that you can't (easily) inherit from
set, see: Supported Typed
p.s.: adding generics support should not be too hard. at this moment, you can bring GenericAlias into your field and throw your validation errors from dec_hook.
I should've probably mentioned that Python 3.9 needs to be supported, which doesn't support function type parameter syntax. Also, is it possible to get this to work without using Any?
@amogus07, I think you can get around new type param syntax by introducing old type-vars bound to set.
for T in fn[T: set](): ... → T = TypeVar('T', bound=set)
However, I'm not sure if you can avoid using Any in dec/enc hooks. Check stubs for what's expected:
https://github.com/jcrist/msgspec/blob/dd965dce22e5278d4935bea923441ecde31b5325/msgspec/init.pyi#L154