kotlin-csv
kotlin-csv copied to clipboard
Introduce BOM for Microsoft applications
Hey there,
thank you very much for this gerat project.
Microsoft applications, for some reason, seem to require a BOM to parse for example UTF-8 files correctly, even though there is no byte order in UTF-8 like there is in 16/32. In order to open a created csv file correctly I suggest to add this special BOM (UTF-8 does require three special bytes 0xEF, 0xBB and 0xBF at the start of the file), even though the csvWriter is configured with the Charsets.UTF_8.name().
Why this is undocumented and why Excel seems to require a BOM for UTF-8 I don't know; might be good questions for Excel team at Microsoft.
What do you think or do you have any suggestion to solve this problem?
@theexiile1305 Thank you for the question. Can you elaborate on this? Is your problem something like the following? "CSV files written by kotlin-csv don't have a BOM, so it cannot be read by Excel."
@doyaaaaaken Thank you for your quick response. Yes of course, I can elaborate on this with the following example: The csv file can be successfully created like with enabled UTF-8 setting
id,name,email
0,Jane,[email protected]
1,Doe,[email protected]
2,Müller,[email protected]
If I open this file Google Spreadsheet or Numbers (macOS spreadsheet application), then Müller is displayed correct. Inc contrast, Müller ist represented as M√ºller in Excel. In the further analysis it was noticed that all UTF-8 special characters (e.g. öäüÄÖÜß - the special german characters) are not displayed correctly in Excel.
@theexiile1305 The situation you described has been successfully reproduced by this code, thanks.
csvWriter().open("test.csv") {
writeRows(listOf(
listOf("id","name","email"),
listOf(0,"Jane","[email protected]"),
listOf(1,"Doe","[email protected]"),
listOf(2,"Müller","[email protected]"),
))
}
So, I plan to introduce an includeBOM: Boolean option on CsvWriterContext.
You can use this option like the below snippet.
Do you think this is ok?
csvWriter{
includeBOM = true
}.open("test.csv") {
//do some operation
}
@doyaaaaaken Sorry for the late response. The above snippet looks gerat and it's okay for me. Thank you!
@doyaaaaaken If you want, I can give a try on that issue. 😄
@theexiile1305 Thanks! Please try it.
@theexiile1305: As a workaround, you can also import the csv file by Data | From Text/CSV instead of just opening it. This has the advantage that you can explicitly select the source file encoding in the import dialog:

hey @doyaaaaaken, has this been resolved?
Hi @EthanDunfordAspect , this has not been resolved yet.