Wordbatch
Wordbatch copied to clipboard
Does the WordSeq extractor support ngrams?
E.g. if I want sequences of integers, with ngrams appended to the end?
Not currently. This can be added as a feature easily. You can add something like this as the last line in your text normalization function : text+= " "+bigrams(text) This will do almost the same, but since its applied as text normalization some things won't be available, such as spelling correction and pruning tokens by frequency. You can get the exact behavior by a more complicated setup, but this should be added as a feature to make it easy to use.