msgxtractr icon indicating copy to clipboard operation
msgxtractr copied to clipboard

Special character issue on windows

Open BirgerNi opened this issue 7 years ago • 5 comments

read_msg does not work for me on windows when there are special characters in the path. The same code works like expected on linux.

Have a look at the example below. In the path of the second mail there are special characters.

On Windows:

library(magrittr)
library(msgxtractr)

system.file("extdata/unicode.msg", package="msgxtractr") %>%
  file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)
#> [1] TRUE TRUE

(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"

lapply(mails, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#> 
#> [[2]]
#> From: [Unspecified]
#> To: [Unspecified]
#> Subject: [Unspecified]

On Linux:

library(magrittr)
library(msgxtractr)

system.file("extdata/unicode.msg", package="msgxtractr") %>%
   file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE) 
#> [1] TRUE TRUE

(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"

lapply(mails, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#>
#> [[2]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2

BirgerNi avatar Oct 15 '18 06:10 BirgerNi

I just added a call to normalizePath() before the file read ops. I'm AFK tday but will try to reproduce on a Windows VM this week ASAP.

hrbrmstr avatar Oct 15 '18 11:10 hrbrmstr

normalizePath() does not seem to help. Please let me know if I can provide further information.

library(magrittr)
library(msgxtractr)

system.file("extdata/unicode.msg", package="msgxtractr") %>%
  file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)
#> [1] TRUE TRUE

(mails <- list.files(pattern = "msg"))
#> [1] "Copenhagen.msg" "København.msg"
(mails2 <- normalizePath(path.expand(mails)))
#> [1] "M:\\msgxtractr\\test\\Copenhagen.msg"
#> [2] "M:\\msgxtractr\\test\\København.msg"

lapply(mails2, read_msg)
#> [[1]]
#> Mon, 18 Nov 2013 10:26:24 +0200
#> From: Brian Zhou <[email protected]>
#> To: [email protected]
#> Subject: Test for TIF files
#> Attachments: 2
#> 
#> [[2]]
#> From: [Unspecified]
#> To: [Unspecified]
#> Subject: [Unspecified]

Created on 2018-10-15 by the reprex package (v0.2.1)

BirgerNi avatar Oct 15 '18 12:10 BirgerNi

Thx for testing. I’ll get the VM fired up tomorrow and shld be able to track this down pretty quickly. On Mon, Oct 15, 2018 at 08:21 BirgerNi [email protected] wrote:

normalizePath() does not seem to help. Please let me know if I can provide further information.

library(magrittr) library(msgxtractr)

system.file("extdata/unicode.msg", package="msgxtractr") %>% file.copy(to = c("Copenhagen.msg", "København.msg"), overwrite = TRUE)#> [1] TRUE TRUE

(mails <- list.files(pattern = "msg"))#> [1] "Copenhagen.msg" "København.msg" (mails2 <- normalizePath(path.expand(mails)))#> [1] "M:\msgxtractr\test\Copenhagen.msg"#> [2] "M:\msgxtractr\test\København.msg"

lapply(mails2, read_msg)#> [[1]]#> Mon, 18 Nov 2013 10:26:24 +0200#> From: Brian Zhou [email protected]#> To: [email protected]#> Subject: Test for TIF files#> Attachments: 2#> #> [[2]]#> From: [Unspecified]#> To: [Unspecified]#> Subject: [Unspecified]

Created on 2018-10-15 by the reprex package https://reprex.tidyverse.org (v0.2.1)

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/hrbrmstr/msgxtractr/issues/10#issuecomment-429831608, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtkhF3uhxJVB4neLgAPcNPi6OWf_4ks5ulH3egaJpZM4Xbraq .

hrbrmstr avatar Oct 15 '18 19:10 hrbrmstr

Try doing:

original_ctype <- Sys.getlocale(category = "LC_CTYPE")
Sys.setlocale("LC_CTYPE", "UTF-8")

before the calls to read_msg()

then

Sys.setlocale("LC_CTYPE", original_ctype)

afterwards.

hrbrmstr avatar Oct 28 '18 20:10 hrbrmstr

I guess your suggestion goes in the right direction, this seems to be an encoding issue.

> Sys.setlocale("LC_CTYPE", "UTF-8")
#> [1] ""
#> Warning message:
#> In Sys.setlocale("LC_CTYPE", "UTF-8") :
#> OS reports request to set locale to "UTF-8" cannot be honored

Unfortunately, I cannot set encoding to UTF-8. According to this topic at so windows still don't support UTF-8.

BirgerNi avatar Oct 29 '18 08:10 BirgerNi