spaCy
spaCy copied to clipboard
Language.factory cannot be a subclass without nlp, name args
How to reproduce the behaviour
Try to import/loading SubClass
in this example.
from dataclasses import dataclass
from spacy import Language
@dataclass
class SuperClass:
nlp: Language
name: str
@Language.factory("dummy")
class SubClass(SuperClass):
def __init__(self, *args, sub_arg: bool = True):
super().__init__(*args)
self.sub_arg = sub_arg
This will lead to the following error
File "dummy.py", line 13, in <module>
class SubClass(SuperClass):
File "spacy\language.py", line 490, in add_factory
raise ValueError(Errors.E964.format(name=name))
ValueError: [E964] The pipeline component factory for 'dummy' needs to have the following named arguments, which are passed in by spaCy:
- nlp: receives the current nlp object and lets you access the vocab
- name: the name of the component instance, can be used to identify the component, output losses etc.
However, from the surface it would seem that this should just work: ultimately, SubClass does require nlp
and name
through its super SuperClass
. The reason that this errors, to me, is get_arg_names
which only retrieves the arguments for the current class - not its tree.
We could solve this by collecting all args by traversing the class's __mro__
. This, for instance, does work:
param_names = []
for cls in SubClass.__mro__:
param_names += inspect.signature(cls).parameters.keys()
# ['args', 'sub_arg', 'nlp', 'name']
Your Environment
Info about spaCy
- spaCy version: 3.2.4
- Platform: Windows-10-10.0.19041-SP0
- Python version: 3.8.8
It's useful to have this example in the issue tracker in case anyone runs into the same problem, but I don't think we want to add more complicated handling when it's relatively straight-forward to have nlp
and name
arguments in the subclass.
I'll just use @polm's recommendation to use a separate constructor function for my factory, so this problem won't occur.
That being said, simply repeating the arguments of the super class of course defeats the point of subclassing in the first place (DRY). I see that get_arg_names
is only used in this case, so the only change that would need to be made to support this behavior is changing that function to:
def get_arg_names(func: Callable) -> List[str]:
"""Get a list of all named arguments of a function (regular,
keyword-only).
func (Callable): The function
RETURNS (List[str]): The argument names.
"""
try:
return [key for cls in func.__mro__ for key in signature(cls).parameters.keys()]
except AttributeError:
return list(signature(func).parameters.keys())
which does not break any existing behaviour, does not change its usage, and does extend usability to extended object hierarchies.
Minimal example
from inspect import signature
from typing import Callable, List
from dataclasses import dataclass
from spacy import Language
@dataclass
class SuperClass:
nlp: Language
name: str
class SubClass(SuperClass):
def __init__(self, *args, sub_arg: bool = True):
super().__init__(*args)
self.sub_arg = sub_arg
def get_arg_names(func: Callable) -> List[str]:
"""Get a list of all named arguments of a function (regular,
keyword-only).
func (Callable): The function
RETURNS (List[str]): The argument names.
"""
try:
return [key for cls in func.__mro__ for key in signature(cls).parameters.keys()]
except AttributeError:
return list(signature(func).parameters.keys())
def my_func(arg1, arg2):
pass
if __name__ == '__main__':
print(get_arg_names(my_func))
print(get_arg_names(SuperClass))
print(get_arg_names(SubClass))