python-ftfy icon indicating copy to clipboard operation
python-ftfy copied to clipboard

windows-1257 not fix

Open JonhSilver opened this issue 1 year ago • 1 comments

import ftfy print(ftfy.fix_text('SÄ…raÅai'))

JonhSilver avatar May 12 '24 15:05 JonhSilver

This is true -- windows-1257 isn't currently on the list of encodings that gets fixed.

Would you be able to point me to somewhere that I'd find text files that were really encoded in windows-1257, or more examples of windows-1257 mojibake in the wild, so I could make heuristics and test cases out of it?

rspeer avatar Aug 05 '24 22:08 rspeer

TXT files here (i.e. files prior to 2004) are windows-1257 encoded.

http://zagarins.net/kjl/arhivs.html

NilsEnevoldsen avatar Oct 08 '24 20:10 NilsEnevoldsen

Wonderful, this looks like something I can make a heuristic out of for the next version.

rspeer avatar Oct 08 '24 23:10 rspeer

I've got a version that can fix text like "Šveices baņķieri gaida konkrētus investīciju projektus", but what was the example you originally gave supposed to become? It's got a Unicode private use character in it.

rspeer avatar Oct 09 '24 23:10 rspeer

Released in 6.3.

rspeer avatar Oct 12 '24 14:10 rspeer