python-ftfy
python-ftfy copied to clipboard
Feature: distinguish ISO-8859-2 from windows-1250 mojibake
ISO-8859-2 covers many of the same characters as Windows-1250, but unfortunately has the characters in different places.
An awkwardly ambiguous case that I've found is that the text SchlĂźsselwĂśrter
gets decoded by ftfy as SchlßsselwÜrter
, considering it to be Windows-1250 mojibake, when in fact it was ISO-8859-2 mojibake that should have said Schlüsselwörter
. Distinguishing these without additional context would require recognizing the awkward capitalization and the extreme unlikeliness of the sequence "ßss".
I previously made this note because I thought we weren't supporting ISO-8859-2 mojibake at all, but we are. This word decodes correctly in the context of other ISO-8859-2 mojibake.