rhub icon indicating copy to clipboard operation
rhub copied to clipboard

rHub Solaris different from CRAN Solaris?

Open kbenoit opened this issue 4 years ago • 13 comments

Ah Solaris, every R package maintainer's favourite vintage OS - especially if you are using lots of C++ and/or Unicode encodings.

I just updated quanteda and the checks show v2.1.2 is breaking on Solaris on the CRAN results.

On rHub, though, it's clean:

quanteda 2.1.2: NOTE

Build ID: quanteda_2.1.2.tar.gz-08835e9ba4414e1cac78283d9806dc0b

Platform: Oracle Solaris 10, x86, 32 bit, R-release

Submitted: 48 minutes 31.5 seconds ago

Build time: 48 minutes 27 seconds |

NOTES:

* checking data for non-ASCII characters ... NOTE   Note: found 3 marked UTF-8 strings

See the full build log: HTMLtextartifacts.

Any ideas as to what causes the difference, and how I might get the correct check in advance?

kbenoit avatar Sep 23 '20 16:09 kbenoit

I don't think this NOTE prevents CRAN from publishing the package. I don't know why CRAN Solaris is not catching this.

gaborcsardi avatar Sep 23 '20 16:09 gaborcsardi

Oh, that's the clean check. :)

gaborcsardi avatar Sep 23 '20 16:09 gaborcsardi

Is CRAN using the GNU or the Solaris compilers for your package?

gaborcsardi avatar Sep 23 '20 16:09 gaborcsardi

looks like Solaris:

using R version 4.0.2 Patched (2020-09-21 r79235) using platform: i386-pc-solaris2.10 (32-bit)

kbenoit avatar Sep 23 '20 17:09 kbenoit

That's the platform only, which is the same for both. But seems like you are using Rcpp, so they are probably using the GNU compilers.

The error seems to be in the R code, anyway.

gaborcsardi avatar Sep 23 '20 17:09 gaborcsardi

Calls: summary ... stri_detect_regex -> .handleSimpleError -> h -> .handleSimpleError -> h

So maybe the ICU is different? Or maybe it is some encoding issue? Unfortunately I don't really know how to debug CRAN's machine.

You could try telling them that this works on your Solaris machine, and ask for more info, e.g. the ICU version.

gaborcsardi avatar Sep 23 '20 17:09 gaborcsardi

Yes, it's almost certainly the ICU version. Updates via stringi seem to be very platform dependent, as we discovered in https://github.com/quanteda/quanteda/issues/1996.

We are struggling to debug CRAN's Solaris machine too. Unfortunately it seems to be pretty unique. Thanks for trying to help.

kbenoit avatar Sep 23 '20 17:09 kbenoit

CRAN probably special cases stringi on Solaris, and installs it with --disable-cxx11. See https://svn.r-project.org/R-dev-web/trunk/CRAN/QA/BDR/Solaris/x86/packages/tests32/swift.R

I am not sure why they do this, because both the Solaris and the GNU compilers support C++11.

To try to reproduce this, you can get a ready to use Solaris VM (OVA file) with R from here: https://files.r-hub.io/solaris/ There is one file for VirtualBox and another for VMWare.

More instructions here: https://github.com/r-hub/solarischeck/tree/master/packer#updating-r

You can install stringi with the --disable-cxx11 option and see if this installs/uses an older ICU.

Alternatively you can special case your tests/examples to only run if a recent ICU is available.

Edit: I'll also email CRAN to ask why they special case stringi.

gaborcsardi avatar Sep 24 '20 08:09 gaborcsardi

I just tried this, with --disable-cxx11 stringi installs ICU 5.5, and you get this error:

>      corp1 <- corpus(data_char_ukimmig2010[1:2])
>      corp2 <- corpus(data_char_ukimmig2010[3:4])
>      corp3 <- corpus(data_char_ukimmig2010[5:6])
>      summary(c(corp1, corp2, corp3))
Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'x' in selecting a method for function 'as.list': error in evaluating the argument 'x' in selecting a method for function 'which': Illegal argument. (U_ILLEGAL_ARGUMENT_ERROR, context=`^\p{emoji_presentation}+$`)

Without that option stringi installs ICU 6.1, and it works fine:

>      corp1 <- corpus(data_char_ukimmig2010[1:2])
>      corp2 <- corpus(data_char_ukimmig2010[3:4])
>      corp3 <- corpus(data_char_ukimmig2010[5:6])
>      summary(c(corp1, corp2, corp3))
Corpus consisting of 6 documents, showing 6 documents:

         Text Types Tokens Sentences
          BNP  1125   3280        88
    Coalition   142    260         4
 Conservative   251    499        15
       Greens   322    677        21
       Labour   298    680        29
       LibDem   251    483        14

gaborcsardi avatar Sep 24 '20 09:09 gaborcsardi

I think in general it makes sense to check for the ICU version in quanteda, even if CRAN fixes this, because there might be other installation with older ICU versions around.

gaborcsardi avatar Sep 24 '20 09:09 gaborcsardi

@kbenoit It seems that you can check the ICU version in stringi like this:

> stringi::stri_info()[["ICU.version"]]
[1] "61.1"

gaborcsardi avatar Sep 24 '20 09:09 gaborcsardi

Yes we have an open issue to deal with ICU versions, but now we have a good reason to move on that. I'll install the VM you mentioned and use that to fix this. Thanks so much! You 🎸!

kbenoit avatar Sep 24 '20 09:09 kbenoit

Cool. Btw. I had to do this on the VM to install quanteda:

sudo pkgutil -y -i r_base
sudo pkgutil -i -y CSWlibxml2-dev
export MAKE=gmake
R

Some dependencies need GNU make. Let me know if you run into issues.

gaborcsardi avatar Sep 24 '20 09:09 gaborcsardi

CRAN's Solaris is gone, so I'll close this.

gaborcsardi avatar Aug 31 '22 12:08 gaborcsardi

It finally happened! Not with a bang but a whimper. Good riddance.

kbenoit avatar Aug 31 '22 22:08 kbenoit