msgspec icon indicating copy to clipboard operation
msgspec copied to clipboard

msgspec.Struct cannot be used together with abc

Open 777michael777 opened this issue 10 months ago • 12 comments
trafficstars

Description

I am trying to create a Struct that is also an abstract class and run into the metaclass conflict. Here's an example:

import msgspec
import abc


class Base(msgspec.Struct, metaclass=abc.ABCMeta):
    member: int

    @abc.abstractmethod
    def method(self) -> None:
        pass


class Subclass(Base):
    pass


a = Subclass(7)

When executing above code, I get

Traceback (most recent call last):
  File "x.py", line 5, in <module>
    class Base(msgspec.Struct, metaclass=abc.ABCMeta):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Also tried with not using the metaclass, but inherit Base from abc.ABC directly, with the same result.

777michael777 avatar Jan 23 '25 09:01 777michael777

That makes sense, since msgspec.Struct brings its own metaclass. No abstract classes for us.

Funny, I tried to do the same thing just days ago, but I didn't know how to pass both metaclass parameters and two base classes. Didn't occur to me ABC brings a metaclass too.

rafalkrupinski avatar Jan 23 '25 09:01 rafalkrupinski

Did anyone find a solution to this? Any plans to make it compatible, somehow (e.g. introducing an ABCStructMeta we can use, or something like it)?

flisboac avatar Feb 18 '25 14:02 flisboac

You can't get rid of msgspec metaclass, you can only replace ab.ABC with another solution. What's your use case for it?

I'm not even sure what using abc.ABC does specifically, just guessing it's __new__ or __init__ that raises unless class is inherited.

Notice that msgspec doesn't directly support abstract classes - can't do

msgspec.convert(..., AbstractClass)

# instead you need

msgspec.convert(..., ConcreteClass1 | ConcreteClass2)

so in no scenario msgspec will try to instantiate your abstract class.

rafalkrupinski avatar Feb 18 '25 16:02 rafalkrupinski

Allow me to chime in again- with regard to the use case. My goal was to define an abstract base class - obviously - where I can make sure that all subclasses adhere to the interface defined in the base class. The definition of the subclasses - or rather, the values for the members of the subclasses - comes from a YAML file in my case. So if someone develops a new subclass, they will know pretty soon - as soon as they use that subclass in the YAML file and msgspec therefore tries to instantiate it when parsing the file - if they did not implement all abstract methods yet (because the subclass cannot be instantiated). At the moment, you only get a runtime error if you call the not-abstract method and the base class raises a NotImplementedError (as I do at the moment, but am not satisfied with that solution!)

Please take another look at the initial example: If msgspec supported abstract base classes, a runtime error would be raised at exactly the same point in time (just be a different one than currently), meaning when the class is supposed to be instantiated. As it is now, we would need to get rid of the abstractmethod-decorator (as msgspec does not support abstract base classes) and just raise a NotImplementedError in method, which would only be raised during runtime, if and when someone actually calls method - which is bad in my eyes

777michael777 avatar Mar 04 '25 08:03 777michael777

The type-checker would raise an error if your user tries to implement your class without implementing all the methods. Sounds like a non-issue.

winstxnhdw avatar Mar 17 '25 11:03 winstxnhdw

@winstxnhdw: It is an issue... Your statement was true, if I could use abc as a meta-class (exactly what I asked for!). So here's a code showing exactly what I do in my productive code:

from __future__ import annotations

from msgspec import Struct
import msgspec


class Elements(Struct):
    elements: dict[str, Child1 | Child2] = {}


class BaseClass(Struct):
    name: str

    def some_func(self) -> None:
        msg = "Needs to be implemented in subclasses"
        raise NotImplementedError(msg)


class Child1(BaseClass, tag=True):
    pass


class Child2(BaseClass, tag=True):
    def some_func(self) -> None:
        print("Really implemented")


if __name__ == "__main__":
    my_yaml = b"""elements:
    child1:
        name: "Me"
        type: "Child1"
    child2:
        name: "You"
        type: "Child2"
"""
    elements: Elements = msgspec.yaml.decode(my_yaml, type=Elements)
    print(elements.elements)
    elements.elements["child2"].some_func()
    elements.elements["child1"].some_func()

The problem is, that an exception only appears once some_func() of Child1 instance is called - so in the last line of the code - and not when any code checker is run (because without abc, the code checker doesn't see that children need to implement the function and instances cannot be created) This is bad because the class definitions and implementations are in a library maintained by one developer, while the code in __main__ is the application code and developed by someone else, who doesn't know if all functions have been implemented in the library or not (and should not get an exception when a "production ready" library is used)...

777michael777 avatar Mar 18 '25 15:03 777michael777

Use a Protocol and install a type checker and your IDE/editor will warn you of unimplemented methods.

winstxnhdw avatar Mar 18 '25 15:03 winstxnhdw

@winstxnhdw: Thanks for the hint regarding Protocols. Unfortunately, I can't come up with code that would actually work as I would like it to.

Here's what I have currently:

from __future__ import annotations

import msgspec
from typing import Protocol


class Elements(msgspec.Struct):
    elements: dict[str, Child1 | Child2] = {}


class BaseClass(msgspec.Struct, Protocol):
    name: str

    def some_func(self) -> None:
        ...


class Child1(BaseClass, tag=True):
    pass


class Child2(BaseClass, tag=True):
    def some_func(self) -> None:
        print("Really implemented")


if __name__ == "__main__":
    my_yaml = b"""elements:
    child1:
        name: "Me"
        type: "Child1"
    child2:
        name: "You"
        type: "Child2"
"""
    elements: Elements = msgspec.yaml.decode(my_yaml, type=Elements)
    print(elements.elements)
    elements.elements["child2"].some_func()
    elements.elements["child1"].some_func()

When running mypy, I actually get the warning in line 11 (class BaseClass(msgspec.Struct, Protocol):) that All bases of a protocol must be protocols. During runtime, I get TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases (same line). Could you point me into the right direction?

777michael777 avatar Mar 19 '25 12:03 777michael777

Hey @777michael777, so I tried your example and I am all the more confused why you need any of this in the first place. Even without using a Protocol or an ABC, your type checker will scream at you just as it is.

from __future__ import annotations

from msgspec import Struct
from msgspec.yaml import decode


class Elements(Struct):
    elements: dict[str, Child1 | Child2] = {}


class Child1(Struct, tag=True):
    pass


class Child2(Struct, tag=True):
    def some_func(self) -> None:
        print("Really implemented")


if __name__ == "__main__":
    my_yaml = b"""elements:
    child1:
        name: "Me"
        type: "Child1"
    child2:
        name: "You"
        type: "Child2"
"""

    elements: Elements = decode(my_yaml, type=Elements)
    print(elements.elements)
    elements.elements["child2"].some_func()
    elements.elements["child1"].some_func()

Image

And it makes sense because the type checker can clearly see that you have not implemented some_func in Child1 based on the expected union type you've given elements.

winstxnhdw avatar Mar 24 '25 19:03 winstxnhdw

Hi @winstxnhdw,

Ok, let me try to explain what I had in mind: My BaseClass (that you left out in your example) in reality is called "SpectrumAnalyzer", and is supposed to be a base class for a certain type of measurement equipment (like spectrum analyzers). Child1 and Child2 would be concrete implementations of spectrum analyzers - from different vendors, different models, you name it. The class definitions - so Elements, ChildX (and BaseClass in my example) - are part of library for measurement equipment, while the code in the main section is part of the application, with the YAML content in a configuration file where the user specifies which kind of spectrum analyzer they have on their table next to them.

Apart from the configuration file, the users don't really care about the concrete type of spectrum analyzer, they just need to know which functions they can call for any of the spectrum analyzers - like "init", "configure_measurement_type1", etc. The implementation of these functions might be different for different kinds of spectrum analyzers, or they might be the same. The user doesn't need to know or care, as the library will handle this correctly.

So my thinking was "I need an abstract class SpectrumAnalyzer that implements common functionality and defines a set of functions that the concrete spectrum analyzers need to implement in order to be usable". Hence, I came up with the idea of using ABC for the BaseClass. If you have a better idea on how to solve this, I would very much appreciate the input! An important point for me - as the developer of the library - would be, that I should get errors from a type checker - or any kind of other tool - without user code! So when implementing a new spectrum analyzer, I know that I am not finished until I implemented all functions of the abstract class without having to implement user code.

777michael777 avatar Mar 27 '25 10:03 777michael777

This isn't a huge issue, but I will add my 2c - the reason I wanted to use an abc (or a Protocol) is because I have some library code that requires certain methods to be implemented on objects that get passed in. You do indeed get warnings from the linter when the methods aren't present, but the warnings appear in the library code rather than the client code. Ideally the person writing the client could would get the warning when they try to pass in an object that doesn't implement the interface. Lots of ways to get the desired result (meaning broken code doesn't get to prod), this was just the most obvious thing for me to try.

EDIT: and just to add how I got the message I wanted (simplified version):

class OutputSchema(typing.Protocol):
    @classmethod
    def some_method_i_need(cls) -> str:
        raise NotImplementedError

class MyBrokenOutputSchema(msgspec.Struct):
    # this class doesn't conform to the protocol
    # @classmethod
    # some_method_i_need(cls) -> str:
    #    return 'zzz'

def do_something[T: OutputSchema](obj: T):
    print(obj)

do_something(MyBrokenOutputSchema())   # <- type warning appears here

danmur avatar Apr 14 '25 05:04 danmur

Hey @777michael777, I hadn't had the time to take another look at your example, but the example @danmur gave is one of the ways I'd try to approach the problem.

Rather than inheriting from Struct itself, in most cases, using a protocol for structural typing is sufficient.

winstxnhdw avatar Apr 14 '25 07:04 winstxnhdw

Resolved by https://github.com/jcrist/msgspec/pull/890

>>> from msgspec import Struct, StructMeta
>>> from abc import ABCMeta
>>> class ABCStructMeta(StructMeta, ABCMeta): ...
...
>>> class StructABC(Struct, metaclass=ABCStructMeta): ...
...
>>>

ofek avatar Oct 25 '25 16:10 ofek