natural icon indicating copy to clipboard operation
natural copied to clipboard

NGrams doesnt support words with hyphen and slash in English

Open sam68740 opened this issue 3 years ago • 0 comments

There are a few words in English that contain hyphen or slash

Example:

  • image-based
  • text-based
  • links/CTA

It would be great if Natural could manage these cases.

let text = "links text-based opposed image-based links/CTA’s"
var NGrams = natural.NGrams;
const T = natural.AggressiveTokenizer;
const tokenizer = new T();
NGrams.setTokenizer(tokenizer);
console.log(NGrams.ngrams(text, 1));

Output: [["links"], ["text"], ["based"], ["opposed"], ["image"], ["based"], ["links"], ["CTA"], ["s"]]

sam68740 avatar Oct 23 '22 13:10 sam68740