LibLangly
LibLangly copied to clipboard
Word Boundary detection
Methods like Words()
are supposed to be splitting... words. But they don't. They split on spaces, which isn't necessarily the only boundary. Also, Words()
should be removing non word components, but it's not.
In order to do this, a proper implementation of word boundary detection is required. UAX 21.4 describes this.
this and this describe an issue with zwsp along with the debate around it. I've settled on a solution involving keeping the Cf
classification instead of Zs
, but also ensuring that it is detected as a word boundary. So zwsp (U+200B) absolutely must be recognized that way.
Appologies for the transfer spam. This definately belongs here now.