writexl icon indicating copy to clipboard operation
writexl copied to clipboard

Expected encoding of libxlswriter

Open jeroen opened this issue 6 years ago • 5 comments

Lookup what encoding libxlsxwriter expects for strings.

jeroen avatar Jan 09 '19 00:01 jeroen

See https://libxlsxwriter.github.io/worksheet_8h.html:

Unicode strings are supported in UTF-8 encoding. This generally requires that your source file is UTF-8 encoded or that the data has been read from a UTF-8 source: worksheet_write_string(worksheet, 0, 0, "Это фраза на русском!", NULL);

jeroen avatar Jan 09 '19 00:01 jeroen

See #30 (which was closed, perhaps in favor of this which addresses the same issue)

This file produces an unopenable xlsx file:

bad.txt

dn <- read_csv("bad.txt")
dn %>% writexl::write_xlsx("bad.xlsx")

This is presumably due to the encoding which should appear as "43 cm³". A call to iconv fixes the problem. It is probably advisable to call iconv( ... , to="UTF-8") on all character columns before creating the xlsx file.

System Info

packageVersion('writexl') [1] ‘1.1’

R.version _
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.1
year 2017
month 06
day 30
svn rev 72865
language R
version.string R version 3.4.1 (2017-06-30) nickname Single Candle

ctbrown avatar Jan 09 '19 05:01 ctbrown

First, you are using a really old version of R. Please update.

I think your problem appears not in writexl, but earlier on when you read the text file. Please print() the data before writing, so that we can see if the encoding was correct before exporting to xlsx.

Try the following code to read your text as Windows encoded:

library(readr)
library(writexl)
dn <- read_csv("bad.txt", locale = locale(encoding = 'ISO-8859-1'))
print(dn)
writexl::write_xlsx(dn, "bad.xlsx")

We cannot call iconv unconditionally as you suggest, because this will actually corrupt correctly encoding text. You just need to make sure you read in your data properly.

jeroen avatar Jan 09 '19 10:01 jeroen

I don't think that it is really old.

The problem is definitely incorrectly encoded text. The data was read in correctly. And the workflow is not unusual. Adobe Forms > Export to Spreadsheet > read_csv() or read_excel(). I understand that it is probably not something that write_xlsx() should maybe handle but the limitation should be documented as Excel used in the workflow does not produce a similar error.

Chris

On Wed, Jan 9, 2019 at 2:36 AM Jeroen Ooms [email protected] wrote:

First, you are using a really old version of R. Please update https://cran.r-project.org/bin/windows/base/.

I think your problem appears not in writexl, but earlier on when you read the text file. Please print() the data before writing, so that we can see if the encoding was correct before exporting to xlsx.

Try the following code to read your text as Windows encoded:

library(readr) library(writexl)dn <- read_csv("bad.txt", locale = locale(encoding = 'ISO-8859-1')) print(dn)writexl::write_xlsx(dn, "bad.xlsx")

We cannot call iconv unconditionally as you suggest, because this will actually corrupt correctly encoding text. You just need to make sure you read in your data properly.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ropensci/writexl/issues/31#issuecomment-452650143, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5xa6SPqznbODgt5hp-VsrzLhDi85SJks5vBcYZgaJpZM4Z2hpK .

ctbrown avatar Jan 09 '19 17:01 ctbrown

There really isn't any limitation on the writexl side. The strings gets written to the spreadsheet as you have it in your data frame. If print(dn) shows anything other than what you see in excel, then there would be a bug, but this is not the case.

The problem with plain text file (including csv) is that you don't know which encoding they are in. It looks like adobe exports it as 'ISO-8859-1' on Windows, but readr::read_csv() defaults to UTF-8. Hence special characters get converted wrong if you don't pass a locale to read_csv().

jeroen avatar Jan 09 '19 22:01 jeroen