python-ftfy
python-ftfy copied to clipboard
example that could work: RosŽ
Hello,
I ran into this in a printed label in a supermarket for some wine.
This is an example of double encoding
Here is the Python that shows how the bad text was created:
"Rosé" 'Rosé' _.encode("mac_roman").decode("cp1252") 'RosŽ' _.encode("utf_8").decode("cp1252") 'RosŽ'
The problem is that fix_encoding only fixes 1 level:
print(ftfy.fix_encoding("RosŽ")) RosŽ
and then it gets stuck:
print(ftfy.fix_encoding("RosŽ")) RosŽ

This looks like a duplicate of https://github.com/rspeer/python-ftfy/issues/18
That's right -- without a much fancier heuristic, we can't tell that "RosŽ" isn't the correct string.