ustring
ustring copied to clipboard
UnicodeString for PHP7
if it's kept, it should pad to the "visible width", ergo should take count of combining glyphs
they are redundant with substring(). there are better ways to avoid unnecessary copies [stringbuilder is one, or interning], provided that we care of this kind of optimizations
ICU supports this in `UnicodeString`, but our API doesn't. We need to allow specifying the locale, because `toUpper` and `toLower` will behave differently for different locales, e.g. in Turkish, `I`...
which characters will trim() trim? whitespace is very subjective and different in every language, including unicode i would use C#'s whitespace definition as default (https://msdn.microsoft.com/en-us/library/t809ektx(v=vs.110).aspx) plus the null byte as...
stuff already exists in the class Normalizer but would be handy to have it in this class as well. could look like JS's normalize() https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize echo u($str)->normalize(UString::NFKC)->subString(0, 4);
Since this doesn't operate on grapheme clusters, we should refer to dealing with codepoints, not characters. In particular, documentation comments need changing, and `charAt` should be `codepointAt`.
Currently, string reversal works on code points, it doesn't care what kind. So it won't reverse strings containing combining characters properly. We could quite simply implement the [Missy Elliot algorithm](https://github.com/mathiasbynens/esrever)...
One important thing needing doing is making sure indexOf, split, startsWith and so on work properly with empty strings, i.e. pretending there's one between each actual codepoint. Otherwise it's harder...
I would like to see the API more feature safe to other encodings/charsets. - rename `[get|set][Default]Codepage` to `[get|set][Default]Encoding` - remove the `U` from `UString` (or use a different name) ```...
Hi, The indentation style of this project is starting to look a bit funky, so can we agree on a crude coding standard we should be following? I'm used to...