spacy-clausie
spacy-clausie copied to clipboard
Improvements in _get_verb_matches
There are a few things in _get_verb_matches that I find strange:
- Why are you using several individual rules instead of a single one such as:
pattern = [{'POS':'VERB','OP':'?'},
{'POS':'ADV','OP':'*'},
{'POS':'AUX', 'OP':'*'},
{'POS':'PART', 'OP':'*'},
{'POS':'VERB', 'OP':'+'}]
- The current patterns miss negations attached to the verb. That may not be considered part of the verb per se, but when you remake the clause with the current code, the negation is lost.
- Shouldn't the matcher be an attribute of the class (so it doesn't need to be created every time the pipeline gets called)?
Hi
It's been so long since I built it that I don't really remember why I did it like that. You might be right that it's better that way. You can try making a PR with your thoughts.
Shouldn't the matcher be an attribute of the class (so it doesn't need to be created every time the pipeline gets called)?
I think this is left over from when I was working with SpaCy 2.0. I wasn't quite sure how to work with the pattern matcher and I did it in a way that was convenient and did what I want. Again if you have a better way I would be happy to consider a PR on that!
Closing this as inactive.