alfaaz icon indicating copy to clipboard operation
alfaaz copied to clipboard

A note on using Intl.Segmenter

Open andrew--r opened this issue 2 years ago • 2 comments
trafficstars

Hi! Intl.Segmenter is a native API for locale-aware text segmentation. While is not supported everywhere yet, it’d be nice to mention it in README and maybe compare the library not only to regexps, but also to the native API.

andrew--r avatar May 02 '23 10:05 andrew--r

Here's the sample code to use that if it helps:

function countWordsViaIntl(text) {
  const segmenter = new Intl.Segmenter(void 0, { granularity: "word" });
  const iterable = segmenter.segment(text);
  let i = 0;
  for (const e of iterable) if (e.isWordLike) i++;
  return i;
}

Note that it may not return the same result as countWords provided in this repo as there could be edge cases around emojis.

hyrious avatar May 06 '23 08:05 hyrious

@hyrious @andrew--r I tried adding a benchmark for Intl.Segmenter but unfortunately it constantly errors out with Javascript out of memory. Either the issue is with Tinybench or the implementation of Intl.Segmenter.

thecodrr avatar May 06 '23 11:05 thecodrr