twitter-text
twitter-text copied to clipboard
Wrong documentation of displayRange and validRange
READMEs in the Java, Objective-C and Ruby libraries incorrectly state that displayRange
and validRange
are "pairs of unicode code point indices".
However, the actual implementations and the conformance test suite suggests that they are UTF-16 code unit indices.
One example can be found here:
text: "π·πΎπ‘π₯π©"
expected:
displayRangeStart: 0
displayRangeEnd: 9
validRangeStart: 0
validRangeEnd: 9
Each emoji in the text consists of a single Unicode code point, thus Unicode length of the text is 5. On the other hand, as each emoji is represented by a surrogate pair in UTF-16 encoding, length of the UTF-16 code units is 10. This implies that the test case expects the parser to return UTF-16 ranges.
Furthermore, this JavaScript code calculates the displayRangeEnd
using the String.length
method, which, by the specification, counts UTF-16 code units.
I think either the documents or the parser API should be fixed for consistency.