kotlin-csv icon indicating copy to clipboard operation
kotlin-csv copied to clipboard

Introduce BOM for Microsoft applications

Open theexiile1305 opened this issue 4 years ago • 10 comments
trafficstars

Hey there,

thank you very much for this gerat project.

Microsoft applications, for some reason, seem to require a BOM to parse for example UTF-8 files correctly, even though there is no byte order in UTF-8 like there is in 16/32. In order to open a created csv file correctly I suggest to add this special BOM (UTF-8 does require three special bytes 0xEF, 0xBB and 0xBF at the start of the file), even though the csvWriter is configured with the Charsets.UTF_8.name().

Why this is undocumented and why Excel seems to require a BOM for UTF-8 I don't know; might be good questions for Excel team at Microsoft.

What do you think or do you have any suggestion to solve this problem?

theexiile1305 avatar Apr 22 '21 10:04 theexiile1305

@theexiile1305 Thank you for the question. Can you elaborate on this? Is your problem something like the following? "CSV files written by kotlin-csv don't have a BOM, so it cannot be read by Excel."

doyaaaaaken avatar Apr 23 '21 12:04 doyaaaaaken

@doyaaaaaken Thank you for your quick response. Yes of course, I can elaborate on this with the following example: The csv file can be successfully created like with enabled UTF-8 setting

id,name,email
0,Jane,[email protected]
1,Doe,[email protected]
2,Müller,[email protected]

If I open this file Google Spreadsheet or Numbers (macOS spreadsheet application), then Müller is displayed correct. Inc contrast, Müller ist represented as M√ºller in Excel. In the further analysis it was noticed that all UTF-8 special characters (e.g. öäüÄÖÜß - the special german characters) are not displayed correctly in Excel.

theexiile1305 avatar Apr 23 '21 13:04 theexiile1305

@theexiile1305 The situation you described has been successfully reproduced by this code, thanks.

        csvWriter().open("test.csv") {
            writeRows(listOf(
                listOf("id","name","email"),
                listOf(0,"Jane","[email protected]"),
                listOf(1,"Doe","[email protected]"),
                listOf(2,"Müller","[email protected]"),
            ))
        }

So, I plan to introduce an includeBOM: Boolean option on CsvWriterContext. You can use this option like the below snippet. Do you think this is ok?

csvWriter{
    includeBOM = true
}.open("test.csv") {
  //do some operation
}

doyaaaaaken avatar Apr 26 '21 13:04 doyaaaaaken

@doyaaaaaken Sorry for the late response. The above snippet looks gerat and it's okay for me. Thank you!

theexiile1305 avatar Apr 29 '21 12:04 theexiile1305

@doyaaaaaken If you want, I can give a try on that issue. 😄

theexiile1305 avatar Apr 30 '21 13:04 theexiile1305

@theexiile1305 Thanks! Please try it.

doyaaaaaken avatar Apr 30 '21 14:04 doyaaaaaken

@theexiile1305: As a workaround, you can also import the csv file by Data | From Text/CSV instead of just opening it. This has the advantage that you can explicitly select the source file encoding in the import dialog:

grafik

StefRe avatar Oct 11 '21 12:10 StefRe

hey @doyaaaaaken, has this been resolved?

EthanDunfordAspect avatar Mar 09 '22 10:03 EthanDunfordAspect

Hi @EthanDunfordAspect , this has not been resolved yet.

doyaaaaaken avatar Mar 10 '22 06:03 doyaaaaaken