fastr icon indicating copy to clipboard operation
fastr copied to clipboard

Run `dplyr` package examples

Open steve-s opened this issue 6 years ago • 11 comments

Make sure FastR can install dplyr and run dplyr examples.

Some issues: dplyr uses Rcpp, which under some circumstances (e.g. in function duplicated) invokes DATAPTR macro on character vector, which is not supported by FastR at the moment.

steve-s avatar Apr 03 '18 07:04 steve-s

DATAPTR macro works for character vectors, but gives only read-only view on character vectors. However, there are some more issues with dplyr that need to be addressed.

steve-s avatar May 30 '18 14:05 steve-s

using graalvm-ce-1.0.0-RC5

install.packages('dplyr')
library(dplyr)
...

installs & loads without error in a basic R script;

however, when invoking some of the dplyr package fn's when trying to run the script, generates a JVM error:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000124899779, pid=17141, tid=0x0000000000001503
#
# JRE version: OpenJDK Runtime Environment (8.0_172-b11) (build 1.8.0_172-20180626105433.graaluser.jdk8u-src-tar-g-b11)
# Java VM: GraalVM 1.0.0-rc5 (25.71-b01-internal-jvmci-0.46 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [dplyr.so+0x57779]  _ZN4Rcpp16ShrinkableVectorILi13EEC2EiPv+0xb9

The whole GraalVM project seems very promising; would love to see it work with this common library.

tuddman avatar Aug 22 '18 14:08 tuddman

The reported error is caused by a dplyr native function accessing directly an internal structure instead of using the appropriate API function. Fortunatelly, the latest GitHub version of dplyr (0.7.99.9000) no longer contains that native function and works much better with FastR.

FYI: Currently, the latest FastR gives the following result when executing dplyr tests:

OK: 5095 SKIPPED: 7 FAILED: 218

zslajchrt avatar Aug 27 '18 16:08 zslajchrt

how were you able to do that?

how might I be able to do that? Install dplyr (0.7.99.9000) directly from GH.

Do I need devtools pkg to be able to install_github("tidyverse/dplyr") ?

I tried to install devtools as a pre-requisite, to have access to install_github() fn,

and that blows up with a StackOverflow error

UPDATE: Using remotes works.

tuddman avatar Aug 27 '18 18:08 tuddman

I managed to install dplyr from GH as follows:

options(repos="https://cran.r-project.org/")
devtools::install_github("tidyverse/dplyr")

To install the version I played with, use this:

options(repos="https://cran.r-project.org/")
devtools::install_github("tidyverse/dplyr", "3724a0cb48450a4c587c9932d3f972542242a1fc")

zslajchrt avatar Aug 28 '18 17:08 zslajchrt

the following happens on OSX although I've confirmed the same occurs from within a linux docker container:

attempting your approach necessitated (first) installing devtools which itself errors & exits with a StackOverFlow error. No luck there.

Installing, alternatively, remotes seemed to circumvent the install from GH issue.

attempting to follow your downloaded version:

options(repos="https://cran.r-project.org/")
remotes::install_github("tidyverse/dplyr@3724a0cb48450a4c587c9932d3f972542242a1fc")

errors & exits with

** preparing package for lazy loading
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[ : 
  there is no package called ‘pillar’
ERROR: lazy loading failed for package ‘dplyr’
* removing ‘/Users/tuddman/Downloads/graalVM/graalvm-ce-1.0.0-rc5/Contents/Home/jre/languages/R/library/dplyr’
* restoring previous ‘/Users/tuddman/Downloads/graalVM/graalvm-ce-1.0.0-rc5/Contents/Home/jre/languages/R/library/dplyr’
Warning messages:
1: In missing_devel_warning(pkgdir) :
  Package dplyr has compiled code, but no suitable compiler(s) were found. Installation will likely fail.
  Install XCode and make sure it works.
2: In FUN(X[[i]], ... = ...) :
  named arguments other than 'exact' are discouraged
3: In FUN(X[[i]], ... = ...) :
  named arguments other than 'exact' are discouraged
4: In FUN(X[[i]], ... = ...) :
  named arguments other than 'exact' are discouraged
5: In i.p(...) : installation of package ‘fansi’ had non-zero exit status
6: In i.p(...) : installation of package ‘pillar’ had non-zero exit status
7: In i.p(...) :
  installation of package ‘/var/folders/01/6f82w_g90zs7bwclxqmtkhsr0000gp/T/RtmpOKoP8h/remoteswfk0tapw2rqa/tidyverse-dplyr-6648cbb’ had non-zero exit status
Error: package or namespace load failed for ‘dplyr’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[:
 there is no package called ‘pillar’
Error: package or namespace load failed for ‘dplyr’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[:
 there is no package called ‘pillar’
	at com.oracle.truffle.r.runtime.RErrorHandling.errorcallDfltWithCall(RErrorHandling.java:573)

any thoughts or catching something I've overlooked would be much appreciated.

tuddman avatar Aug 29 '18 17:08 tuddman

running the above on OSX seems to fail on account of not having XCode

running the above in a linux container seems to fail on account of something else:

...
Content type 'application/x-gzip' length 41629 bytes (40 KB)

The downloaded source packages are in
	‘/tmp/RtmpopFw3A/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Downloading GitHub repo tidyverse/dplyr@3724a0cb48450a4c587c9932d3f972542242a1fc
Downloading GitHub repo r-lib/rlang@master
Skipping 1 packages ahead of CRAN: rlang
Installing 18 packages: BH, R6, Rcpp, assertthat, bindr, bindrcpp, cli, crayon, fansi, glue, magrittr, pillar, pkgconfig, plogr, purrr, tibble, tidyselect, utf8
Content type 'application/x-gzip' length 11583445 bytes (11.0 MB)
Content type 'application/x-gzip' length 322959 bytes (315 KB)
Content type 'application/x-gzip' length 3809164 bytes (3.6 MB)
Content type 'application/x-gzip' length 11612 bytes (11 KB)
Content type 'application/x-gzip' length 6185 bytes
Content type 'application/x-gzip' length 10212 bytes
Content type 'application/x-gzip' length 1933874 bytes (1.8 MB)
Content type 'application/x-gzip' length 658694 bytes (643 KB)
Content type 'application/x-gzip' length 243661 bytes (237 KB)
Content type 'application/x-gzip' length 56368 bytes (55 KB)
Content type 'application/x-gzip' length 200504 bytes (195 KB)
Content type 'application/x-gzip' length 103104 bytes (100 KB)
Content type 'application/x-gzip' length 6024 bytes
Content type 'application/x-gzip' length 7795 bytes
Content type 'application/x-gzip' length 126147 bytes (123 KB)
Content type 'application/x-gzip' length 109808 bytes (107 KB)
Content type 'application/x-gzip' length 21702 bytes (21 KB)
Content type 'application/x-gzip' length 218882 bytes (213 KB)

The downloaded source packages are in
	‘/tmp/RtmpopFw3A/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
There were 12 warnings (use warnings() to see them)
Error in library(dplyr) : there is no package called ‘dplyr’
Error in library(dplyr) : there is no package called ‘dplyr’
	at com.oracle.truffle.r.runtime.RErrorHandling.errorcallDfltWithCall(RErrorHandling.java:573)

tuddman avatar Aug 29 '18 18:08 tuddman

Hi Scott,

devtools should work in the next public release, but there are still some issues with dplyr. We are now actively working on it and will post updates here.

Note that the easiest option to try out our fixes immediately without waiting for next public release is to use the community provided Docker image that builds FastR from sources: https://github.com/nuest/fastr-docker

Update: If you could share a bit about your use-case, what features you'd like to see in FastR, and what other R packages are essential for your use-case that would be great, but I understand that that is not always possible. You can also reach me at stepan[dot]sindelar-at-oracle.com

steve-s avatar Aug 30 '18 11:08 steve-s

Update: current status

OK: 3671 SKIPPED: 7 FAILED: 8

steve-s avatar Dec 04 '18 12:12 steve-s

Hi @steve-s , what is the current status with dplyr?

marekrogala avatar Jun 25 '20 09:06 marekrogala

Hello Marek,

there are still few tests (single digit number) that we have to ignore, but otherwise dplyr seems to work fine. Note that FastR uses fixed CRAN snapshot (currently from 2019-11-03) and those are the versions of the packages we test. You can switch to normal CRAN, but there it's harder for us to guarantee compatibility with important packages, because it's a moving target.

steve-s avatar Jun 25 '20 13:06 steve-s