msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

What other examples should we add?

Open jcrist opened this issue 2 years ago • 7 comments

In #147 we added a new Examples section to the docs, and added an example of using msgpsec to write a concise (and performant) GeoJSON implementation.

What other examples should we add? This might be other common schemas (JSON-RPC perhaps?) or integrations with other tools (ASGI/Starlette, requests, sqlite, ...)?

jcrist avatar Jul 18 '22 14:07 jcrist

Depending on what kind of examples are being looked for, it might make sense to compare to common serialization framework (like Protobuf, Cap'n Proto, Flatbuffers, etc.). This could be potentially useful for users looking at Msgspec and some of the other options when making a decision or it could be useful for users porting to Msgspec from something else. Just a thought 🙂

jakirkham avatar Jul 18 '22 20:07 jakirkham

That might make sense? msgspec is mostly a nice way of working with JSON/msgpack, it's not really a format itself. The benchmarks already compare against the most performant version of those for python (pyrobuf) - the protobuf/flatbuffers/capnproto implementations for python are all fairly slow, even if the formats can be fast for other languages.

jcrist avatar Jul 18 '22 20:07 jcrist

i have this example which uses multiple features https://gist.github.com/banteg/cee923b6616a8f70db1e18a61b91e39c

banteg avatar Jul 29 '22 20:07 banteg

Sorry should have clarified above, wasn't so much talking about a performance comparison as a code written comparison. IOW showing an example with an implementation in protobuf and then in msgspec. Thinking about this in terms of how to show users familiar with one message passing library on how they can do similar things in msgspec.

jakirkham avatar Jul 30 '22 00:07 jakirkham

More examples of dec_hook would be wonderful. My colleague and I were deciding between Pydantic and this, and from existing documentation, custom validators weren't clear.

If you'd accept it, I would be happy to volunteer a documentation PR.

The canonical example that was brought up was email validation:

import re
from typing import List, Optional, Type, Any
from msgspec import Struct
from msgspec.json import Decoder

regex_email = re.compile(r'([A-Za-z0-9]+[.-_])*[A-Za-z0-9]+@[A-Za-z0-9-]+(\.[A-Z|a-z]{2,})+')
regex_phone = re.compile(r'[0-9]{3}-[0-9]{3}-[0-9]{4}')

class phone_number(str):
    @staticmethod
    def validate(obj):
        if not regex_phone.fullmatch(obj):
             raise ValueError()
        return phone_number(obj)

class email(str):
    @staticmethod
    def validate(obj):
        if not regex_email.fullmatch(obj):
             raise ValueError()
        return email(obj)

class TestEmail(Struct):
    phone_number: phone_number
    email: email

def dec_hook(type: Type, obj: Any) -> Any:
    if type is email:
        if not regex_email.fullmatch(obj):
             raise ValueError()
        return type(obj)
    if type is phone_number:
         if not regex_phone.fullmatch(obj):
             raise ValueError()
         return type(obj)
    return type(obj)

def dec_hook2(type: Type, obj: Any) -> Any:
    return type.validate(obj)

t = '{"email": "[email protected]", "phone_number": "999-999-9999"}'

response_decoder = Decoder(TestEmail, dec_hook=dec_hook).decode(t)
print(response_decoder)

response_decoder2 = Decoder(TestEmail, dec_hook=dec_hook2).decode(t)
print(response_decoder2)

akotlar avatar Jun 06 '23 03:06 akotlar

I think this is a good useful example case to add, I'd be happy to accept a PR if you wanted to get things going. I like the idea of establishing a convention of using validate methods on the custom types to handle the parsing and validation, then dispatching directly to those in the dec_hook (as you do in response_decoder2). I'd probably make those classmethod instead of staticmethod so you can reference cls in the method itself, but that's just a stylistic preference.

Note that in this specific case, since all your validators are regex based you can make use of the existing constraints functionality and avoid using dec_hook altogether. This is also nice since the runtime type of phone_number and email is just str, only the extra validators are attached.

from typing import Annotated

import msgspec


Email = Annotated[
    str,
    msgspec.Meta(
        pattern=r"^([A-Za-z0-9]+[.-_])*[A-Za-z0-9]+@[A-Za-z0-9-]+(\.[A-Z|a-z]{2,})+$"
    ),
]

Phone = Annotated[str, msgspec.Meta(pattern=r"^[0-9]{3}-[0-9]{3}-[0-9]{4}$")]


class Example(msgspec.Struct):
    phone_number: Phone
    email: Email


t = '{"email": "[email protected]", "phone_number": "999-999-9999"}'

msg = msgspec.json.decode(t, type=Example)
print(msg)
#> Example(phone_number='999-999-9999', email='[email protected]')

jcrist avatar Jun 06 '23 04:06 jcrist

Very nice, thanks! The Meta api is something I hadn’t explored yet, this looks super clean.

akotlar avatar Jun 06 '23 04:06 akotlar