edsnlp
edsnlp copied to clipboard
Feature request: better API for adding pipes to a pipeline
Feature type
Adding a pipe to a pipeline has quite a few limitations at the moment:
import edsnlp
nlp = edsnlp.blank('eds')
nlp.add_pipe('eds.matcher', config={"terms": {"key": ["expr 1", "expr 2"]}})
...
- there is no easy way of knowing which pipes are available from the notebook / IDE, and there is no autocompletion
- all pipe parameters are nested in a configuration dict, which is cumbersome
- there is no autocompletion of these parameters, since they are passed via a configuration dict
We can deviate from spacy iconic API and think of something better along these lines:
import edsnlp
import edsnlp.pipes as eds
nlp = edsnlp.blank('eds')
nlp.add_pipe(eds.matcher(terms={"key": ["expr 1", "expr 2"]}))
The problem is, some pipes (like eds.matcher
) requires an nlp object at init time which is given by add_pipe
. We could ask the user to provide the nlp argument nlp.add_pipe(eds.matcher(nlp=nlp, terms={"key": ["expr 1", "expr 2"]}))
but this feels redundant.
Another option is to have promise = eds.matcher(terms={"key": ["expr 1", "expr 2"]})
return a "promise"/"curried" component if a required nlp attribute is missing, which is actually instantiated when it is added to the pipeline (via promise.instantiate(nlp=self)
). This feels like an anti-pattern, and therefore should be extensively documented, and produce warnings whenever a user tries to use a non-initialized pipe outside a pipeline.