writexl
writexl copied to clipboard
Expected encoding of libxlswriter
Lookup what encoding libxlsxwriter expects for strings.
See https://libxlsxwriter.github.io/worksheet_8h.html:
Unicode strings are supported in UTF-8 encoding. This generally requires that your source file is UTF-8 encoded or that the data has been read from a UTF-8 source:
worksheet_write_string(worksheet, 0, 0, "Это фраза на русском!", NULL);
See #30 (which was closed, perhaps in favor of this which addresses the same issue)
This file produces an unopenable xlsx file:
dn <- read_csv("bad.txt")
dn %>% writexl::write_xlsx("bad.xlsx")
This is presumably due to the encoding which should appear as "43 cm³". A call to iconv
fixes the problem. It is probably advisable to call iconv( ... , to="UTF-8")
on all character columns before creating the xlsx file.
System Info
packageVersion('writexl') [1] ‘1.1’
R.version _
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 4.1
year 2017
month 06
day 30
svn rev 72865
language R
version.string R version 3.4.1 (2017-06-30) nickname Single Candle
First, you are using a really old version of R. Please update.
I think your problem appears not in writexl, but earlier on when you read the text file. Please print()
the data before writing, so that we can see if the encoding was correct before exporting to xlsx.
Try the following code to read your text as Windows encoded:
library(readr)
library(writexl)
dn <- read_csv("bad.txt", locale = locale(encoding = 'ISO-8859-1'))
print(dn)
writexl::write_xlsx(dn, "bad.xlsx")
We cannot call iconv
unconditionally as you suggest, because this will actually corrupt correctly encoding text. You just need to make sure you read in your data properly.
I don't think that it is really old.
The problem is definitely incorrectly encoded text. The data was read in correctly. And the workflow is not unusual. Adobe Forms > Export to Spreadsheet > read_csv() or read_excel(). I understand that it is probably not something that write_xlsx() should maybe handle but the limitation should be documented as Excel used in the workflow does not produce a similar error.
Chris
On Wed, Jan 9, 2019 at 2:36 AM Jeroen Ooms [email protected] wrote:
First, you are using a really old version of R. Please update https://cran.r-project.org/bin/windows/base/.
I think your problem appears not in writexl, but earlier on when you read the text file. Please print() the data before writing, so that we can see if the encoding was correct before exporting to xlsx.
Try the following code to read your text as Windows encoded:
library(readr) library(writexl)dn <- read_csv("bad.txt", locale = locale(encoding = 'ISO-8859-1')) print(dn)writexl::write_xlsx(dn, "bad.xlsx")
We cannot call iconv unconditionally as you suggest, because this will actually corrupt correctly encoding text. You just need to make sure you read in your data properly.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ropensci/writexl/issues/31#issuecomment-452650143, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5xa6SPqznbODgt5hp-VsrzLhDi85SJks5vBcYZgaJpZM4Z2hpK .
There really isn't any limitation on the writexl
side. The strings gets written to the spreadsheet as you have it in your data frame. If print(dn)
shows anything other than what you see in excel, then there would be a bug, but this is not the case.
The problem with plain text file (including csv) is that you don't know which encoding they are in. It looks like adobe exports it as 'ISO-8859-1' on Windows, but readr::read_csv()
defaults to UTF-8. Hence special characters get converted wrong if you don't pass a locale to read_csv()
.