re2r icon indicating copy to clipboard operation
re2r copied to clipboard

Rcpp exception with UTF-8 strings on Windows

Open qinwf opened this issue 8 years ago • 5 comments

This Rcpp issue will affect the error message for regular expression.

re2("this (is 测试")
#> Error: missing closing ): this (is 娴嬭瘯 

Here is an issue about related to this before.

[Rcpp-devel] Unicode on windows 1

[Rcpp-devel] Unicode on windows 2

The solution in the above mailing list posts can not solve the exception handling string problem.

I send an email to the Rcpp mailing list about this issue, and here is links to the discussion:

[Rcpp-devel] Rcpp exception with UTF-8 strings on Windows 1

[Rcpp-devel] Rcpp exception with UTF-8 strings on Windows 2

It seems that Rcpp will not fix this very soon. So I suggest to use the origin R-C API to rewrite existing codes.

qinwf avatar Jun 13 '16 00:06 qinwf

Just take a look at the way I handle UTF8 string input in stringi. It's pretty simple. I suggest you LinkingTo: stringi, call stri_enc_toutf8 on a given SEXP object and then play with STRING_ELT etc. on the resulting SEXP.

gagolews avatar Jun 13 '16 07:06 gagolews

Yes, I imported stringi and all of the input strings are processed by stri_enc_toutf8.

qinwf avatar Jun 13 '16 09:06 qinwf

I opened a PR in Rcpp repo to make this fixable with a macro in Rcpp and it was merged.

qinwf avatar Jun 22 '16 00:06 qinwf

that's great that you contributed some code to Rcpp! Good job!

Now when you use the new macro this issue is fixed, right?

So now I would say we can keep the Rcpp interface, right? (we don't need to consider re-writing the re2r interface to use the standard Rinternals.h headers)

tdhock avatar Jun 24 '16 18:06 tdhock

Rcpp 0.12.6 is now on CRAN

http://dirk.eddelbuettel.com/blog/2016/07/19/#rcpp_0.12.6

gagolews avatar Jul 20 '16 07:07 gagolews