Clarify None case in bstr::decode_utf8

Open glts opened this issue 3 years ago • 1 comments

Thank you for this useful library.

In bstr 1.0.1, the documentation for bstr::decode_utf8 states:

When unsuccessful, None is returned along with the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence. In this case, the number of bytes consumed is always between 0 and 3, inclusive, where 0 is only returned when slice is empty.

bstr::decode_utf8(b"\xFFabc") returns (None, 1). The byte \xFF cannot be decoded so the result is None; but the number of bytes that make up a maximal prefix of a valid UTF-8 code unit sequence would be 0, as \xFF is not a valid UTF-8 prefix.

Can you confirm, or can you paraphrase the wording for me?

Nov 09 '22 20:11 glts

Ah. 1 is indeed correct. The docs need to be updated. Returning 0 wouldn't make sense, because 0 is meant to be the terminal condition of a loop. Returning 0 in any other case leads to more complex loop logic that would be easy to get wrong, which would lead to an infinite loop in practice.

Nov 09 '22 21:11 BurntSushi