utf8makevalid : test to identify sequence length and possible values not sufficient

Open JPDelprat opened this issue 2 years ago • 1 comments

Hello,

In utf8makevalid, you use the following test to identify a 4 sequence bytes

"if (0xf0 == (0xf8 & *read))"

This is not correct if you suppose that you can have any invalid string as an input parameter, since only a few values in f0-ff ranges are valid.

Moreover, for valid values in f0-ff ranges, possible values for second byte are not the same one. For example, with f0, valid range for second byte is 90..bf, instead of 80..bf

Regards

Dec 17 '23 21:12 JPDelprat

I'd happily accept a PR that tightened this up with the supporting testing!

Dec 23 '23 20:12 sheredom