rasa-nlu-examples SparseSpacyFeaturizer

SparseSpacyFeaturizer

Open koaning opened this issue 5 years ago • 2 comments

If you have a look at all the attributes that spaCy generates for their tokens then you can imagine that some of these features can be useful for machine learning pipelines. To name a few:

is_oov: is the token part of the vocabulary/does it have a vector?
is_stop: is the token a stopword?
lemma_: what is the lemma of the token
pos/tag coarse/fine-grained part of speech information
morphological features
grammatical dependency

These can all have a discrete representation and could be added in general to a Rasa pipeline.

Sep 02 '20 07:09 koaning

It's probably best to wait until spaCy 3.0 before adding this one.

Oct 21 '20 14:10 koaning

We might also just start with is_oov, is_stop and is_numeric.

Jan 21 '21 09:01 koaning

rasa-nlu-examples rasa-nlu-examples copied to clipboard

SparseSpacyFeaturizer

rasa-nlu-examples
rasa-nlu-examples copied to clipboard