python-ftfy icon indicating copy to clipboard operation
python-ftfy copied to clipboard

Feature: distinguish ISO-8859-2 from windows-1250 mojibake

Open rspeer opened this issue 4 years ago • 1 comments

ISO-8859-2 covers many of the same characters as Windows-1250, but unfortunately has the characters in different places.

An awkwardly ambiguous case that I've found is that the text SchlĂźsselwĂśrter gets decoded by ftfy as SchlßsselwÜrter, considering it to be Windows-1250 mojibake, when in fact it was ISO-8859-2 mojibake that should have said Schlüsselwörter. Distinguishing these without additional context would require recognizing the awkward capitalization and the extreme unlikeliness of the sequence "ßss".

rspeer avatar Feb 11 '21 23:02 rspeer

I previously made this note because I thought we weren't supporting ISO-8859-2 mojibake at all, but we are. This word decodes correctly in the context of other ISO-8859-2 mojibake.

rspeer avatar Feb 11 '21 23:02 rspeer