utf8.h
utf8.h copied to clipboard
utf8makevalid : test to identify sequence length and possible values not sufficient
Hello,
In utf8makevalid, you use the following test to identify a 4 sequence bytes
"if (0xf0 == (0xf8 & *read))"
This is not correct if you suppose that you can have any invalid string as an input parameter, since only a few values in f0-ff ranges are valid.
Moreover, for valid values in f0-ff ranges, possible values for second byte are not the same one. For example, with f0, valid range for second byte is 90..bf, instead of 80..bf
Regards
I'd happily accept a PR that tightened this up with the supporting testing!