markovchain icon indicating copy to clipboard operation
markovchain copied to clipboard

markovchainFit with method="bootstrap" fails on Apache Spark

Open meenaparam opened this issue 6 years ago • 3 comments

I know the markovchain package wasn't designed with Spark in mind, but I wondered if you had any ideas about how to solve a problem I am having with this package on Spark.

This example code runs perfectly fine from the R console on Mac OS X and Centos 7.

library(markovchain)
sequence <- c("a", "b", "a", "a", "a", "a", "b", "a", "b", "a", "b", "a", "a", 
            "b", "b", "b", "a")
mcFitMLE <- markovchain::markovchainFit(data = sequence, method = "mle")
mcFitMAP <- markovchain::markovchainFit(data = sequence, method = "map")
mcFitBSP <- markovchain::markovchainFit(data = sequence, method = "bootstrap", nboot = 5)

When run on Spark via Apache Zeppelin (on Centos 7), the MLE and MAP methods are estimated without problem. However, the bootstrapped method fails and gives this error:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'sample' for signature '"character"'

Do you have any ideas on how I can solve this problem?

Many thanks for a great package!

Mac OS session info:

R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] markovchain_0.6.9.14

loaded via a namespace (and not attached):
 [1] compiler_3.5.3     Matrix_1.2-17      magrittr_1.5       parallel_3.5.3     tools_3.5.3       
 [6] expm_0.999-4       igraph_1.2.4.1     yaml_2.2.0         matlab_1.0.2       Rcpp_1.0.1        
[11] grid_3.5.3         stats4_3.5.3       RcppParallel_4.4.3 pkgconfig_2.0.2    lattice_0.20-38   

Centos 7 session info:

R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] markovchain_0.6.9.14 knitr_1.23           SparkR_2.3.1        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1         lattice_0.20-38    grid_3.6.0
 [4] stats4_3.6.0       magrittr_1.5       evaluate_0.14
 [7] RcppParallel_4.4.3 stringi_1.4.3      Matrix_1.2-17
[10] tools_3.6.0        stringr_1.4.0      markdown_1.0
[13] igraph_1.2.4.1     xfun_0.8           parallel_3.6.0
[16] compiler_3.6.0     pkgconfig_2.0.2    matlab_1.0.2
[19] expm_0.999-4      

Apache Spark version: 2.3.1-mapr-1808 Apache Zeppelin version: 0.8.1

meenaparam avatar Jul 03 '19 09:07 meenaparam

@ncordon do you have an idea about it?

spedygiorgio avatar Jul 03 '19 19:07 spedygiorgio

@ncordon do you have an idea about it?

Upps, sorry, I missed this one. No idea @spedygiorgio, I have never worked with Spark. I can mark it as low priority right now until I can allocate a slot of time to look into it

ncordon avatar Jul 30 '19 17:07 ncordon

Neither I did. We will keep in hold for now

spedygiorgio avatar Jul 30 '19 23:07 spedygiorgio