python-ftfy icon indicating copy to clipboard operation
python-ftfy copied to clipboard

example that could work: RosŽ

Open bostick opened this issue 4 years ago • 1 comments

Hello,

I ran into this in a printed label in a supermarket for some wine.

This is an example of double encoding

Here is the Python that shows how the bad text was created:

"Rosé" 'Rosé' _.encode("mac_roman").decode("cp1252") 'RosŽ' _.encode("utf_8").decode("cp1252") 'RosŽ'

The problem is that fix_encoding only fixes 1 level:

print(ftfy.fix_encoding("RosŽ")) RosŽ

and then it gets stuck:

print(ftfy.fix_encoding("RosŽ")) RosŽ

179060476_10100737413225250_4705143928367452_n

bostick avatar Jul 01 '21 14:07 bostick

This looks like a duplicate of https://github.com/rspeer/python-ftfy/issues/18

Perdjesk avatar Feb 06 '23 16:02 Perdjesk

That's right -- without a much fancier heuristic, we can't tell that "RosŽ" isn't the correct string.

rspeer avatar Oct 09 '24 23:10 rspeer