natural
natural copied to clipboard
NGrams doesnt support words with hyphen and slash in English
There are a few words in English that contain hyphen or slash
Example:
- image-based
- text-based
- links/CTA
It would be great if Natural could manage these cases.
let text = "links text-based opposed image-based links/CTA’s"
var NGrams = natural.NGrams;
const T = natural.AggressiveTokenizer;
const tokenizer = new T();
NGrams.setTokenizer(tokenizer);
console.log(NGrams.ngrams(text, 1));
Output: [["links"], ["text"], ["based"], ["opposed"], ["image"], ["based"], ["links"], ["CTA"], ["s"]]