twitter-text icon indicating copy to clipboard operation
twitter-text copied to clipboard

Wrong documentation of displayRange and validRange

Open swen128 opened this issue 5 years ago β€’ 0 comments

READMEs in the Java, Objective-C and Ruby libraries incorrectly state that displayRange and validRange are "pairs of unicode code point indices". However, the actual implementations and the conformance test suite suggests that they are UTF-16 code unit indices.

One example can be found here:

text: "πŸ˜·πŸ‘ΎπŸ˜‘πŸ”₯πŸ’©"
expected:
    displayRangeStart: 0
    displayRangeEnd: 9
    validRangeStart: 0
    validRangeEnd: 9

Each emoji in the text consists of a single Unicode code point, thus Unicode length of the text is 5. On the other hand, as each emoji is represented by a surrogate pair in UTF-16 encoding, length of the UTF-16 code units is 10. This implies that the test case expects the parser to return UTF-16 ranges.

Furthermore, this JavaScript code calculates the displayRangeEnd using the String.length method, which, by the specification, counts UTF-16 code units.

I think either the documents or the parser API should be fixed for consistency.

swen128 avatar Jul 27 '19 09:07 swen128