spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Language.factory cannot be a subclass without nlp, name args

Open BramVanroy opened this issue 2 years ago • 2 comments

How to reproduce the behaviour

Try to import/loading SubClass in this example.

from dataclasses import dataclass

from spacy import Language


@dataclass
class SuperClass:
    nlp: Language
    name: str


@Language.factory("dummy")
class SubClass(SuperClass):
    def __init__(self, *args, sub_arg: bool = True):
        super().__init__(*args)
        self.sub_arg = sub_arg

This will lead to the following error

File "dummy.py", line 13, in <module>
    class SubClass(SuperClass):
  File "spacy\language.py", line 490, in add_factory
    raise ValueError(Errors.E964.format(name=name))
ValueError: [E964] The pipeline component factory for 'dummy' needs to have the following named arguments, which are passed in by spaCy:
- nlp: receives the current nlp object and lets you access the vocab
- name: the name of the component instance, can be used to identify the component, output losses etc.

However, from the surface it would seem that this should just work: ultimately, SubClass does require nlp and name through its super SuperClass. The reason that this errors, to me, is get_arg_names which only retrieves the arguments for the current class - not its tree.

We could solve this by collecting all args by traversing the class's __mro__. This, for instance, does work:

param_names = []
for cls in SubClass.__mro__:
    param_names += inspect.signature(cls).parameters.keys()
# ['args', 'sub_arg', 'nlp', 'name']

Your Environment

Info about spaCy

  • spaCy version: 3.2.4
  • Platform: Windows-10-10.0.19041-SP0
  • Python version: 3.8.8

BramVanroy avatar Apr 03 '22 13:04 BramVanroy

It's useful to have this example in the issue tracker in case anyone runs into the same problem, but I don't think we want to add more complicated handling when it's relatively straight-forward to have nlp and name arguments in the subclass.

adrianeboyd avatar Apr 04 '22 12:04 adrianeboyd

I'll just use @polm's recommendation to use a separate constructor function for my factory, so this problem won't occur.

That being said, simply repeating the arguments of the super class of course defeats the point of subclassing in the first place (DRY). I see that get_arg_names is only used in this case, so the only change that would need to be made to support this behavior is changing that function to:

def get_arg_names(func: Callable) -> List[str]:
    """Get a list of all named arguments of a function (regular,
    keyword-only).
    func (Callable): The function
    RETURNS (List[str]): The argument names.
    """
    try:
        return [key for cls in func.__mro__ for key in signature(cls).parameters.keys()]
    except AttributeError:
        return list(signature(func).parameters.keys())

which does not break any existing behaviour, does not change its usage, and does extend usability to extended object hierarchies.

Minimal example

from inspect import signature
from typing import Callable, List
from dataclasses import dataclass

from spacy import Language


@dataclass
class SuperClass:
    nlp: Language
    name: str


class SubClass(SuperClass):
    def __init__(self, *args, sub_arg: bool = True):
        super().__init__(*args)
        self.sub_arg = sub_arg


def get_arg_names(func: Callable) -> List[str]:
    """Get a list of all named arguments of a function (regular,
    keyword-only).
    func (Callable): The function
    RETURNS (List[str]): The argument names.
    """
    try:
        return [key for cls in func.__mro__ for key in signature(cls).parameters.keys()]
    except AttributeError:
        return list(signature(func).parameters.keys())


def my_func(arg1, arg2):
    pass


if __name__ == '__main__':
    print(get_arg_names(my_func))
    print(get_arg_names(SuperClass))
    print(get_arg_names(SubClass))

BramVanroy avatar Apr 04 '22 12:04 BramVanroy