rasa-nlu-examples
rasa-nlu-examples copied to clipboard
SparseSpacyFeaturizer
If you have a look at all the attributes that spaCy generates for their tokens then you can imagine that some of these features can be useful for machine learning pipelines. To name a few:
is_oov: is the token part of the vocabulary/does it have a vector?is_stop: is the token a stopword?lemma_: what is the lemma of the tokenpos/tagcoarse/fine-grained part of speech information- morphological features
- grammatical dependency
These can all have a discrete representation and could be added in general to a Rasa pipeline.
It's probably best to wait until spaCy 3.0 before adding this one.
We might also just start with is_oov, is_stop and is_numeric.