cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Correctly fold unknown-8bit originating from encoded words.

Open bitdancer opened this issue 1 month ago • 1 comments

The unknown-8bit trick was designed to deal with unknown bytes in an ASCII message, and it works fine for that. However, I also tried to extend it to handle bytes that can't be decoded using the charset specified in an encoded word, and there it fails because there can be other non-ASCII characters that were successfully decoded. The fix is simple: do the unknown-8bit encoding using the utf-8 codec. This is especially appropriate since anyone trying to do recovery on an unknown byte string will probably attempt utf-8 first.

bitdancer avatar Dec 10 '25 14:12 bitdancer

Does anyone want to review this, or shall I just merge it?

bitdancer avatar Dec 16 '25 19:12 bitdancer

Thanks @bitdancer for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. 🐍🍒⛏🤖

miss-islington-app[bot] avatar Dec 24 '25 14:12 miss-islington-app[bot]

Thanks @bitdancer for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14. 🐍🍒⛏🤖

miss-islington-app[bot] avatar Dec 24 '25 14:12 miss-islington-app[bot]

GH-143146 is a backport of this pull request to the 3.14 branch.

bedevere-app[bot] avatar Dec 24 '25 14:12 bedevere-app[bot]

GH-143147 is a backport of this pull request to the 3.13 branch.

bedevere-app[bot] avatar Dec 24 '25 14:12 bedevere-app[bot]