msgxtractr
msgxtractr copied to clipboard
Special character issue on windows
read_msg does not work for me on windows when there are special characters in the path. The same code works like expected on linux.
Have a look at the example below. In the path of the second mail there are special characters.
On Windows:
library(magrittr)
library(msgxtractr)
system.file("extdata/unicode.msg", package="msgxtractr") %>%
file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)
#> [1] TRUE TRUE
(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"
lapply(mails, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#>
#> [[2]]
#> From: [Unspecified]
#> To: [Unspecified]
#> Subject: [Unspecified]
On Linux:
library(magrittr)
library(msgxtractr)
system.file("extdata/unicode.msg", package="msgxtractr") %>%
file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)
#> [1] TRUE TRUE
(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"
lapply(mails, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#>
#> [[2]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
I just added a call to normalizePath() before the file read ops. I'm AFK tday but will try to reproduce on a Windows VM this week ASAP.
normalizePath() does not seem to help. Please let me know if I can provide further information.
library(magrittr)
library(msgxtractr)
system.file("extdata/unicode.msg", package="msgxtractr") %>%
file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)
#> [1] TRUE TRUE
(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"
(mails2 <- normalizePath(path.expand(mails)))
#> [1] "M:\\msgxtractr\\test\\Copenhagen.msg"
#> [2] "M:\\msgxtractr\\test\\København.msg"
lapply(mails2, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#>
#> [[2]]
#> From: [Unspecified]
#> To: [Unspecified]
#> Subject: [Unspecified]
Created on 2018-10-15 by the reprex package (v0.2.1)
Thx for testing. I’ll get the VM fired up tomorrow and shld be able to track this down pretty quickly. On Mon, Oct 15, 2018 at 08:21 BirgerNi [email protected] wrote:
normalizePath() does not seem to help. Please let me know if I can provide further information.
library(magrittr) library(msgxtractr)
system.file("extdata/unicode.msg", package="msgxtractr") %>% file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)#> [1] TRUE TRUE
(mails <- list.files(pattern = "msg"))#> [1] "Copenhagen.msg" "København.msg" (mails2 <- normalizePath(path.expand(mails)))#> [1] "M:\msgxtractr\test\Copenhagen.msg"#> [2] "M:\msgxtractr\test\København.msg"
lapply(mails2, read_msg)#> [[1]]#> Mon, 18 Nov 2013 10:26:24 +0200#> From: Brian Zhou [email protected]#> To: [email protected]#> Subject: Test for TIF files#> Attachments: 2#> #> [[2]]#> From: [Unspecified]#> To: [Unspecified]#> Subject: [Unspecified]
Created on 2018-10-15 by the reprex package https://reprex.tidyverse.org (v0.2.1)
— You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/msgxtractr/issues/10#issuecomment-429831608, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtkhF3uhxJVB4neLgAPcNPi6OWf_4ks5ulH3egaJpZM4Xbraq .
Try doing:
original_ctype <- Sys.getlocale(category = "LC_CTYPE")
Sys.setlocale("LC_CTYPE", "UTF-8")
before the calls to read_msg()
then
Sys.setlocale("LC_CTYPE", original_ctype)
afterwards.
I guess your suggestion goes in the right direction, this seems to be an encoding issue.
> Sys.setlocale("LC_CTYPE", "UTF-8")
#> [1] ""
#> Warning message:
#> In Sys.setlocale("LC_CTYPE", "UTF-8") :
#> OS reports request to set locale to "UTF-8" cannot be honored
Unfortunately, I cannot set encoding to UTF-8. According to this topic at so windows still don't support UTF-8.