ustring issues

consider to fix (if needed) or remove pad()

2

if it's kept, it should pad to the "visible width", ergo should take count of combining glyphs

remove $offset from indexOf() and lastIndexOf()

2

they are redundant with substring(). there are better ways to avoid unnecessary copies [stringbuilder is one, or interning], provided that we care of this kind of optimizations

Wes0617

No locale options for toUpper/toLower

ICU supports this in `UnicodeString`, but our API doesn't. We need to allow specifying the locale, because `toUpper` and `toLower` will behave differently for different locales, e.g. in Turkish, `I`...

hikari-no-yume

provide a definition of whitespace for trim()

1

which characters will trim() trim? whitespace is very subjective and different in every language, including unicode i would use C#'s whitespace definition as default (https://msdn.microsoft.com/en-us/library/t809ektx(v=vs.110).aspx) plus the null byte as...

Wes0617

add a normalize() method for unicode normalization forms

1

stuff already exists in the class Normalizer but would be handy to have it in this class as well. could look like JS's normalize() https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize echo u($str)->normalize(UString::NFKC)->subString(0, 4);

Wes0617

Codepoints, not characters

9

Since this doesn't operate on grapheme clusters, we should refer to dealing with codepoints, not characters. In particular, documentation comments need changing, and `charAt` should be `codepointAt`.

hikari-no-yume

String reversal works on code points

3

Currently, string reversal works on code points, it doesn't care what kind. So it won't reverse strings containing combining characters properly. We could quite simply implement the [Missy Elliot algorithm](https://github.com/mathiasbynens/esrever)...

hikari-no-yume

Handling of empty strings

One important thing needing doing is making sure indexOf, split, startsWith and so on work properly with empty strings, i.e. pretending there's one between each actual codepoint. Otherwise it's harder...

hikari-no-yume

bug

enhancement

Feature safity for non unicode strings

3

I would like to see the API more feature safe to other encodings/charsets. - rename `[get|set][Default]Codepage` to `[get|set][Default]Encoding` - remove the `U` from `UString` (or use a different name) ```...

marc-mabe

Indentation?

4

Hi, The indentation style of this project is starting to look a bit funky, so can we agree on a crude coding standard we should be following? I'm used to...

datibbaw

ustring
ustring copied to clipboard

Metadata

consider to fix (if needed) or remove pad()

remove $offset from indexOf() and lastIndexOf()

No locale options for toUpper/toLower

provide a definition of whitespace for trim()

add a normalize() method for unicode normalization forms

Codepoints, not characters

String reversal works on code points

Handling of empty strings

Feature safity for non unicode strings

Indentation?

← Metadata

Owner

Metadata

ustring ustring copied to clipboard

Metadata

← Metadata

Owner

Metadata

ustring
ustring copied to clipboard