msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

Integrate with the attrs ecosystem

Open Tinche opened this issue 2 years ago • 12 comments

Description

Hello! I think this is a cool library, and I think you should consider integrating with the attrs ecosystem.

A small disclaimer first: I'm a major contributor to attrs (behind @hynek), the author of cattrs, and at this point I'm slowly getting comfortable starting to say I'm one of the mypy attrs plugin maintainers.

I haven't had the chance to use msgspec for real yet. You've done some very impressive things efficiency-wise, but I feel there's a lot of overlap between some work here and some work in attrs/cattrs. So now I'm thinking of ways to integrate.

Strategically speaking, I think msgspec is cool but limited in a number of ways. From what I understand, you don't support generics and PyPy (maybe other constraints I'm not aware of, not sure if you support dict classes?), which would limit my personal use of the library. Also I'm attached to the way cattrs does customization of de/serialization.

However, here's the thing: if we integrate the way I'm thinking, there would be no need for you to actually do these things (you could if you wanted to, or you could focus on other aspects). We could integrate msgspec classes (er, structs) into the attrs system and msgspec de/serialization into cattrs.

A simple idea would be having msgspec as an attrs backend that would be opt-in - so if your class isn't generic, and you don't use PyPy, you'd choose to use msgspec, otherwise you'd use the default attrs backend and have an attrs class. The resulting class would contain __attrs_attrs__, and so would essentially be an attrs class.

What I mean by the attrs ecosystem: for example, Rich supports attrs classes, although I know you support Rich in msgspec. Another example is an OpenAPI generator I'm working on for attrs classes, which would just work for msgspec classes too. A third example is work I'm doing for better typing of attrs.fields for use in ORMs/ODMs; this would also just work.

The first and most important question, however, is if you are even open to doing this? If yes, we can bounce ideas.

Tinche avatar Feb 13 '23 14:02 Tinche

Hi! Thanks for opening this. A few thoughts inline:

From what I understand, you don't support generics and PyPy (maybe other constraints I'm not aware of, not sure if you support dict classes?), which would limit my personal use of the library.

Generic structs (#193) are currently unsupported, but are at the top of my list to implement. This is a solvable problem, and is something I'd expect to be resolved in the next 3 months (hopefully sooner).

By "dict classes" I assume you mean Structs with a __dict__ defined? Those are unplanned and unlikely to happen without a strong use case. Do you have a use case for those over relying on __slots__?

PyPy support is also not planned. It's doable, but it's a lot of work and isn't something I think is worth it.

Also I'm attached to the way cattrs does customization of de/serialization.

I need to write up an issue for this, but before 1.0 I plan to redo our extension mechanism. A single hook for encoding and decoding (enc_hook/dec_hook) doesn't provide enough granularity, and is limiting support for fancier types. This won't exactly match how cattrs does things, but will be closer than what we have now. Would you be open to giving some feedback on the design there once I write things up?

if your class isn't generic, and you don't use PyPy, you'd choose to use msgspec, otherwise you'd use the default attrs backend and have an attrs class. The resulting class would contain __attrs_attrs__, and so would essentially be an attrs class.

IIUC you're saying that you'd use msgspec to accelerate cattrs (or attrs?) for cases where msgspec would provide drop-in functionality? I think this may be doable, but I'm not sure what the benefit would be here? If you're not using msgspec to encode the entire message (which you wouldn't if you want to use cattrs throughout) then the benefit of msgspec would be less noticeable. Faster __init__/__eq__ for ordered types, but that may be it. That said, depending on what's needed here we may be able to make something work. I'd be against adding __attrs_attrs__ to Struct types internally, but if we can provide the proper hooks that you can plumb things together then sure!

Another approach would be adding native support to msgspec for encoding/decoding attrs types, similar to what we have for dataclasses. I was able to get serialization working in about 10 minutes of hacking tonight (deserialization is always harder), but this would also be doable.

In [1]: import msgspec

In [2]: import attrs

In [3]: @attrs.define
   ...: class Example:
   ...:     x: int
   ...:     y: list[str]
   ...:     z: dict[str, int]
   ...: 

In [4]: msg = Example(1, ["two", "three"], {"four": 5})

In [5]: msgspec.json.encode(msg)
Out[5]: b'{"x":1,"y":["two","three"],"z":{"four":5}}'

If we went this route I'd definitely want advice from you on how this should be handled - you are after all the expert on serializing attrs things :).

Anyway, I guess in summary:

  • Yes, I think msgspec and attrs could integrate in some way
  • I don't want to make msgspec.Struct types look like attrs types within msgspec, but if we can make this possible for you to do then I'd be happy help
  • I do think we could add support for encoding/decoding attrs types, similar to how we support dataclasses now

jcrist avatar Feb 14 '23 03:02 jcrist

My initial thoughts were not focused on serialization, although that's a part of it. Even if serialization wasn't part of the story, I also wouldn't discount the benefits of faster __init__/__eq__ so quickly ;) (Also I assume the __repr__ can also be accelerated? For example, we repr() for our logs on every request.)

Let's talk vision first.

Let me describe my personal work situation to give you a better idea of where I'm coming from. Our stack is heavily based on attrs:

  • we have a MongoDB ODM based on attrs (which I want to open source)
  • we have a simple DynamoDB ODM based on attrs
  • it's inevitable a Redis, Postgres and an ElasticSearch/OpenSearch ODM will get written one day, especially if I manage to land the Mypy changes that I want
  • we're using the Flatbuffers serialization format, and I've written an attrs layer on top of it (which I'm open sourcing within a week or two)
  • we're using attrs for interservice communication (msgpack) and for the web interface (json)
  • I plan on writing a validation framework for attrs classes that's able to be typed checked, and decoupled from the process of un/structuring

Due to the nature of attrs, all of these work together (a class can be shared between the databases and all the interfaces).

Just to be clear, this isn't about solving my problems at work. (If it was about that, I'd just write a msgspec-attrs adapter internally and use it. Also I would absolutely love open sourcing all of this stuff, so that's one of my long-term goals.) This is about joining efforts for the benefit of both open-source ecosystems.

For some of these use cases, I'm doing work in the broader open source community to take it a step further. For example, my next step for the Mypy plugin will be to implement support for attrs.fields(cl) to allow statically-checked database projections (don't want to segue too much here). Once certain other features land in Mypy, I'll pick up work on uapi (think FastAPI but cleaner and for attrs). Other people are building their own stuff on attrs.

So what I'd like is this: we create a situation where I can drop msgspec into this stack and it just works, and is faster where that's possible. It's definitely not just about cattrs. It's a shame if we make users pick between super fast JSON serialization and using the model in a database too.

If we figure out a way to integrate:

  • we on the attrs side get an option of faster classes and faster serialization
  • you on the msgspec side get to reap the benefits of all of this work without having to do it twice, and you increase your audience
  • the ecosystem as a whole gets stronger, which helps everyone

To answer some of your comments:

By "dict classes" I assume you mean Structs with a dict defined?

Yeah, there are some use cases, for example @cached_property and Exceptions (it's loosely-assumed arbitrary attributes can be attached to Exceptions in certain context, I remember this coming up when I was adding ExceptionGroups to cattrs and the __note__ PEP was being discussed).

Would you be open to giving some feedback on the design there once I write things up?

Sure, of course.

Another approach would be adding native support to msgspec for encoding/decoding attrs types

This would be sweet. It'd let me integrate msgspec into cattrs fairly easily, for example.

To finish up this overly long post, some thoughts on integration. This section is brainstormy, just me spitballing ideas.

You wouldn't need to change how msgspec works by default.

It would be awesome if it was possible to just do:

from msgspec import define, Factory

@define
class Test:
    a: int = 5
    b: list[int] = Factory(list)

Because it would just be a drop-in replacement for vanilla attrs. This might not be possible because msgspec.field is already something different than attrs.field so it might confuse typecheckers, I don't know, maybe this stuff could live in msgspec.attrs instead. This interface doesn't have to support all of the attrs features, although I think you have most of them anyway. (The Mypy plugin would have to be changed to take this into effect.) Also you wouldn't need to depend on attrs, these could be conditionally created if attrs is already present.

These classes, being both attrs and msgspec classes, would work with both cattrs and msgspec serialization. The step after that would be me integrating msgspec into cattrs by implementing msgspec converters for json and msgpack.

Tinche avatar Feb 16 '23 00:02 Tinche

This would be sweet. It'd let me integrate msgspec into cattrs fairly easily, for example.

I spent some time hacking together full attrs support tonight. If you could take a look at/try out #323 that would be much appreciated.

I promise I'll do a longer writeup response to your comment above sometime tomorrow, it's already late here :(.

jcrist avatar Feb 16 '23 05:02 jcrist

Also I assume the repr can also be accelerated? For example, we repr() for our logs on every request.)

That's true, a quick benchmark shows ~2x faster for a simple message (from external conversations it sounds like you also might like #322, which should also be faster).

Repr benchmark Code
In [1]: import attrs, msgspec                                                                                                                                                         [0/300]

In [2]: @attrs.define                          
   ...: class Test:                            
   ...:     field_one: int
   ...:     field_two: int
   ...:     field_three: int
   ...:     field_four: int
   ...:                                        

In [3]: attrs_obj = Test(1, 2, 3, 4)

In [4]: class Test(msgspec.Struct):
   ...:     field_one: int
   ...:     field_two: int
   ...:     field_three: int
   ...:     field_four: int
   ...:                                        

In [5]: msgspec_obj = Test(1, 2, 3, 4)

In [6]: %timeit repr(attrs_obj)
740 ns ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [7]: %timeit repr(msgspec_obj)
331 ns ± 0.586 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Yeah, there are some use cases, for example @cached_property and Exceptions (it's loosely-assumed arbitrary attributes can be attached to Exceptions in certain context, I remember this coming up when I was adding ExceptionGroups to cattrs and the __note__ PEP was being discussed).

Wait, are you saying you use attrs when defining custom exception subclasses? That's a use case I'd never thought of.

In case there's any confusion, let me be clear on what msgspec supports here:

  • msgspec.Struct types are always __slots__ only. I have little intention of changing this for storing annotated attributes. There's an open issue for supporting cached_property (#204), which won't work on classes without a __dict__. We might solve this by adding an opt-in lazily created __dict__ (dict=True or something). This would only be used for adding extra hidden state on an instance, and not for storing any annotated attributes.
  • When serializing other types (including dataclasses or attrs as of #323) attributes may be stored in either __dict__ or __slots__.

It would be awesome if it was possible to just do: ...

I don't know, maybe this stuff could live in msgspec.attrs instead. This interface doesn't have to support all of the attrs features, although I think you have most of them anyway.

I was picturing attrs (or cattrs) optionally using msgspec behind the scenes here, not msgspec providing an attrs-like interface. We could do that, but I'd worry that any inconsistencies in implementation would lead to subtle issues on the user's end. What requirements are there for the output of define being an attrs object? Just matching the define kwargs and behaviors (for things we support)? Or do you also need the generated __attrs_attrs__ attribute? And if so, how often does that interface change?

Also note that msgspec.Struct types work by subclassing; any magic define decorator made here would have to inject the Struct base class into the class hierarchy to work properly.

The Mypy plugin would have to be changed to take this into effect

I've been hoping that we'll never have to write a mypy plugin (#19) with the new typing.dataclass_transform work, but mypy support for this is still lagging. If you're ever feeling motivated to write a new mypy plugin, I'd love some help here :).


Stepping back to look places where attrs might interact with msgspec:

  • Accelerated encoding/decoding of attrs objects. This is handled by #323.

  • Accelerated generated methods for attrs objects (__init__/__eq__/__copy__/__repr__, ...). I'm not 100% convinced this matters for real world code, but could be doable with either a msgspec.attrs wrapper (or attrs optionally relying on msgspec). msgspec.Struct types have more restrictions that attrs types though, so many features of attrs wouldn't be supportable out-of-the-box. I'm unlikely to spend effort loosening these restrictions without a motivating use case.

Are there other interaction points missing here?

jcrist avatar Feb 17 '23 16:02 jcrist

Hi, I caught a cold so my replies here will be somewhat delayed, sorry ;)

Tinche avatar Feb 22 '23 22:02 Tinche

No worries, hope you feel better soon!

I'm hoping to cut another release in the next week or two - do you think you'll have time to review #323 before then? We can always extend our attrs support, but changing existing behavior after a release is harder.

jcrist avatar Feb 24 '23 20:02 jcrist

Hope you're feeling better @Tinche - any chance you could provide your thoughts on #323 sometime in the coming week? If possible I'd like to include a first pass at attrs support in the next release.

jcrist avatar Mar 02 '23 21:03 jcrist

Yessir, putting this on my shortlist

Tinche avatar Mar 02 '23 22:03 Tinche

@Tinche, do you think you'll have time to review #323 soon (as well as respond to my comment above)? If you don't have time to continue with this issue in the next week I'm going to close this as stale (we can always pick it up again later).

jcrist avatar Mar 15 '23 15:03 jcrist

Hello,

I've kinda had to declare open source bankruptcy recently since I'm way behind on everything, hence all the delays.

So let me try answering some of your questions.

I think it's perfectly ok if the hypothetical msgspec.define isn't 100% compatible with attrs.define since it's an opt-in thing, users will know what they're getting into. So slot classes only perfectly fine.

We'd need the __attrs_attrs__ attribute, since tooling requires it. I'd say the attribute is half of the point here. __attrs_attrs__ is an instance of a tuple subclass (it's not a NamedTuple, since those have restrictions on their attribute naming that are too severe for this use case), with properties that return attrs.Attribute objects. attrs.Attribute changes infrequently but it does change sometimes. We've never had other users of it though. I'm guessing we set up a communication channel so we let you know of upcoming changes (maybe I can contribute PRs when it happens?).

Also note that msgspec.Struct types work by subclassing; any magic define decorator made here would have to inject the Struct base class into the class hierarchy to work properly.

I don't think this is a problem, attrs.define already does crimes to add __slots__ to classes that don't have them. So you'd recreate the class with msgspec.Struct in the MRO too. There are some tricky that need to be done with rewriting closure cells in methods so things like bare super() works that we can help you with (dataclasses have the same problem, but we try harder than them to solve it).

I think your summary hits the nail on the head. But like I mentioned, I think speed (in __init__, __eq__, __repr__ is very valuable for some use cases (my use cases). If I can get significantly faster class construction by switching my classes to msgspec.attrs.define while the rest of my tooling continues just working, I'm going to do it, and I won't be the only one probably. I think you'd just need to provide a good subset, not the entire subset.

Tinche avatar Mar 16 '23 01:03 Tinche

I've kinda had to declare open source bankruptcy recently since I'm way behind on everything, hence all the delays.

No worries, I've definitely been there. Thanks for the reply!

I think I have enough info to hack something together given your answers above. No promises on when I might get around to this, but an attrs-compatible shim seems potentially useful, and at worst would be a fun thing to hack on.

jcrist avatar Mar 16 '23 01:03 jcrist

Are there any conclusions out of this issue?

nrbnlulu avatar Mar 27 '23 08:03 nrbnlulu