markovchainFit with method="bootstrap" fails on Apache Spark
I know the markovchain package wasn't designed with Spark in mind, but I wondered if you had any ideas about how to solve a problem I am having with this package on Spark.
This example code runs perfectly fine from the R console on Mac OS X and Centos 7.
library(markovchain)
sequence <- c("a", "b", "a", "a", "a", "a", "b", "a", "b", "a", "b", "a", "a",
"b", "b", "b", "a")
mcFitMLE <- markovchain::markovchainFit(data = sequence, method = "mle")
mcFitMAP <- markovchain::markovchainFit(data = sequence, method = "map")
mcFitBSP <- markovchain::markovchainFit(data = sequence, method = "bootstrap", nboot = 5)
When run on Spark via Apache Zeppelin (on Centos 7), the MLE and MAP methods are estimated without problem. However, the bootstrapped method fails and gives this error:
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'sample' for signature '"character"'
Do you have any ideas on how I can solve this problem?
Many thanks for a great package!
Mac OS session info:
R version 3.5.3 (2019-03-11)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] markovchain_0.6.9.14
loaded via a namespace (and not attached):
[1] compiler_3.5.3 Matrix_1.2-17 magrittr_1.5 parallel_3.5.3 tools_3.5.3
[6] expm_0.999-4 igraph_1.2.4.1 yaml_2.2.0 matlab_1.0.2 Rcpp_1.0.1
[11] grid_3.5.3 stats4_3.5.3 RcppParallel_4.4.3 pkgconfig_2.0.2 lattice_0.20-38
Centos 7 session info:
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] markovchain_0.6.9.14 knitr_1.23 SparkR_2.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 lattice_0.20-38 grid_3.6.0
[4] stats4_3.6.0 magrittr_1.5 evaluate_0.14
[7] RcppParallel_4.4.3 stringi_1.4.3 Matrix_1.2-17
[10] tools_3.6.0 stringr_1.4.0 markdown_1.0
[13] igraph_1.2.4.1 xfun_0.8 parallel_3.6.0
[16] compiler_3.6.0 pkgconfig_2.0.2 matlab_1.0.2
[19] expm_0.999-4
Apache Spark version: 2.3.1-mapr-1808 Apache Zeppelin version: 0.8.1
@ncordon do you have an idea about it?
@ncordon do you have an idea about it?
Upps, sorry, I missed this one. No idea @spedygiorgio, I have never worked with Spark. I can mark it as low priority right now until I can allocate a slot of time to look into it
Neither I did. We will keep in hold for now