grapheme-splitter
grapheme-splitter copied to clipboard
A JavaScript library that breaks strings into their individual user-perceived characters.
The symbol "\u200D\u2764\uFE0F\u200D" seems to be processed incorrectly. I can string together an endless count of that symbol and it always counts as one grapheme, until the chain is interrupted...
## Using emojis like ๐ฉโ๐ฆฐ๐ฉโ๐ฉโ๐ฆโ๐ฆ๐ณ๏ธโ๐ ``` var splitter = new GraphemeSplitter(); var graphemeCount = splitter.countGraphemes('๐ฉโ๐ฆฐ๐ฉโ๐ฉโ๐ฆโ๐ฆ๐ณ๏ธโ๐'); console.log(graphemeCount) ``` ## Result: `4`
**Input:** '๐๐โค๐โ๐' **Output:** [ '๐', '๐', 'โค', '๐', 'โ', '๐' ] (actual) [ '๐', '๐', 'โฅ', '๐', 'โ', '๐' ] (what it looks like in code) For some reason when...
Hi there, first of all, thanks a lot for this library and the efforts you put in! I've got a scenario, where some emojis seem to be split up the...
เค เคจเฅเคเฅเคเฅเคฆ should return the 4 strings ["เค ", "เคจเฅ", "เคเฅเคเฅ", "เคฆ"] and not ["เค ","เคจเฅ","เคเฅ","เคเฅ","เคฆ"]. Basically how the cursor acts in the string. The cursor skips over the 4 characters or graphemes...
- [x] upgrade source code to ES2017 and transpile using babel - [x] implement UAX 29 [Extended Grapheme Clusters Segmentation](http://www.unicode.org/reports/tr29/tr29-33.html#Grapheme_Cluster_Boundaries) on Unicode 11 The change should be a breaking change...
@orling For my personal interest on Unicode, I would like to do a refactor of this library, here is some thoughts come to me: - [x] Transcribe the whole library...
Thanks for your lib, it is very helpful. However I am experiencing issues with Khmer language and the combining mark [U+17D2](https://codepoints.net/U+017D2?lang=en) (See: https://r12a.github.io/scripts/khmer/block#char17D2) which is specific to Khmer language and...