bstr icon indicating copy to clipboard operation
bstr copied to clipboard

Clarify None case in bstr::decode_utf8

Open glts opened this issue 3 years ago • 1 comments

Thank you for this useful library.

In bstr 1.0.1, the documentation for bstr::decode_utf8 states:

When unsuccessful, None is returned along with the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence. In this case, the number of bytes consumed is always between 0 and 3, inclusive, where 0 is only returned when slice is empty.

bstr::decode_utf8(b"\xFFabc") returns (None, 1). The byte \xFF cannot be decoded so the result is None; but the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence would be 0, as \xFF is not a valid UTF-8 prefix.

Can you confirm, or can you paraphrase the wording for me?

glts avatar Nov 09 '22 20:11 glts

Ah. 1 is indeed correct. The docs need to be updated. Returning 0 wouldn't make sense, because 0 is meant to be the terminal condition of a loop. Returning 0 in any other case leads to more complex loop logic that would be easy to get wrong, which would lead to an infinite loop in practice.

BurntSushi avatar Nov 09 '22 21:11 BurntSushi