pagefind
pagefind copied to clipboard
Segmentation in browser possible for specialized languages with Intl.Segmenter
Here at the end is mentioned that for specialized languages like Chinese or Japanese Pagefind is not able to segment it into words in the browser.
With the JS-API Intl.Segmenter this is possible in all major browsers, s. mdn
So the example of the Pagefind-doc works with that:
const segmenterZh = new Intl.Segmenter("zh", { granularity: "word" });
const string1 = "每個月都";
const iterator1 = segmenterZh.segment(string1)[Symbol.iterator]();
console.log(iterator1.next().value.segment);
// output: '每個'
console.log(iterator1.next().value.segment);
// output: '月'
console.log(iterator1.next().value.segment);
// output: '都'
Hello! 👋
Yes, I've been following this one (with excitement!), but haven't found the time to get it all plugged in yet. Thanks for opening an issue for it!