file icon indicating copy to clipboard operation
file copied to clipboard

File.toString only works on UTF-8 files

Open albertdahlin opened this issue 3 years ago • 0 comments

Problem

I use File.toString on text files that are not encoded in UTF-8 (in my case it is ISO-8859-1). This turns all my swedish letters into � (U+FFFD) which is the unicode replacement character.

image

This happens when the file is converted to string. Since all non-ascii characters (åäöÅÄÖ in my case) all gets turned into the same unicode character there is no easy way to fix it after file has been read. I guess one could read it as Bytes and convert to UTF-8 manually.

Possible solution

The FileReader.readAsText() supports a second parameter to specify encoding.

I tested changing the code in the Elm.Kernel.File module, adding my encoding and it worked. Maybe the encoding could be added as an argument to File.toString?

reader.readAsText(blob, 'ISO-8859-1');

image

albertdahlin avatar Oct 26 '21 20:10 albertdahlin