fastr icon indicating copy to clipboard operation
fastr copied to clipboard

Performance of purrr package in fastr

Open hsselman opened this issue 5 years ago • 2 comments

Hi,

I am using graalvm and fastr rc-15. I was testing the purrr package in fastr. I got no errors using the functions I want to use. But I did found it extremely slow. The purrr package is a wonderful upgrade of the base functions of apply-family.

For example, I made this file (let's call it purrr.R):

library(pdftools)
library(glue)
library(purrr)
library(tictoc)

# A lot of code to get an object to test performance on
img <- pdf_render_page(pdf = "https://cran.r-project.org/web/packages/purrr/purrr.pdf", page = 1)
dimension <- dim(img)
img_glued <- glue('#{img[1, 1:dimension[2], 1:dimension[3]]}{img[2, 1:dimension[2], 1:dimension[3]]}{img[3, 1:dimension[2], 1:dimension[3]]}{img[4, 1:dimension[2], 1:dimension[3]]}')
img_mat <- matrix(img_glued, nrow = dimension[2], ncol = dimension[3])
img_mat_not_white <- img_mat!="#ffffffff"
l <- dim(img_mat_not_white)[1]
list_y <- data.frame(y = rep(NA_integer_, l), stringsAsFactors = FALSE)
list_y$x <- list(NA_integer_)
for(i in 1:l){
  x <- which(img_mat_not_white[i,])
  if(length(x) > 0){
    list_y$y[i] <- i
    list_y$x[i] <- list(x)
  }
}
list_y <- list_y[!is.na(list_y$y),]

tic("Determine number of pixels per row (using purrr)")
result1 <- map_int(list_y$x, length)
toc()
tic("Determine number of pixels per row (using sapply)")
result2 <- sapply(list_y$x, length)
toc()

identical(result1, result2)

Sourcing the files in R and fastR (source("purrr.R")) gives me:

# R
Determine number of pixels per row (using purrr): 0.002 sec elapsed
Determine number of pixels per row (using sapply): 0.001 sec elapsed

# fastR
Determine number of pixels per row (using purrr): 0.825 sec elapsed
Determine number of pixels per row (using sapply): 0.006 sec elapsed

For the size operations I usually perform purrr package functions like map_int are faster than for example sapply.

Do you know what is happening? Thanks in advance and for developing fastR!

hsselman avatar Apr 26 '19 10:04 hsselman

Hi,

the problem you are describing is caused by the communication between Java and the native code of the purrr package. The native interface is a serious bottleneck, but good news is that we have made good progress in tackling this issue. We will integrate the optimised native interface soon. Furthermore, we are experimenting with the LLVM native interface, by which we could theoretically get rid off the bottleneck entirely and use the full power of the Graal JIT compiler, which, for example, could inline code across the boundary Java-Native. The LLVM interface is very promising, as you can see in the result below. (When using LLVM, the native code of packages is translated into the LLVM bitcode and then interpreted by the GraalVM as is R).

I ran your example on both the optimised Java-native interface and the LLVM one, which is cca 2x faster. You can see that after cca 5 iterations (when Graal JIT compiler starts compiling) the performance improved significantly, although it is still slower than the GNU-R run.

# FastR - NFI
Determine number of pixels per row (using purrr):: 0.260 sec elapsed
Determine number of pixels per row (using sapply): 0.005 sec elapsed
...
Determine number of pixels per row (using purrr):: 0.054 sec elapsed
Determine number of pixels per row (using sapply): 0.001 sec elapsed

# FastR - LLVM
Determine number of pixels per row (using purrr):: 0.382 sec elapsed
Determine number of pixels per row (using sapply): 0.003 sec elapsed
...
Determine number of pixels per row (using purrr):: 0.023 sec elapsed
Determine number of pixels per row (using sapply): 0.001 sec elapsed

zslajchrt avatar May 03 '19 12:05 zslajchrt

Hi,

Thank you for your answer! Your example shows real performance improvements using NFI or LLVM. That's great! Still behind the benchmark using base R functions, but indeed a large improvement. Also noticed that R and fastR (using NFI or LLVM) become similar in performance, which is awesome!

Thanks for all the support! We'll keep following the improvements made in fastR, for sure.

hsselman avatar May 07 '19 11:05 hsselman