Wordbatch icon indicating copy to clipboard operation
Wordbatch copied to clipboard

Does the WordSeq extractor support ngrams?

Open zachmayer opened this issue 7 years ago • 1 comments

E.g. if I want sequences of integers, with ngrams appended to the end?

zachmayer avatar Feb 02 '18 19:02 zachmayer

Not currently. This can be added as a feature easily. You can add something like this as the last line in your text normalization function : text+= " "+bigrams(text) This will do almost the same, but since its applied as text normalization some things won't be available, such as spelling correction and pruning tokens by frequency. You can get the exact behavior by a more complicated setup, but this should be added as a feature to make it easy to use.

anttttti avatar Feb 03 '18 06:02 anttttti