gregleleu
gregleleu
It doesn't happen anymore with Spark 3.4.0, and now the column is collected as a binary column (not a jobj pointer). (Still the same on 3.3.1)
I'm on a mac yes, but it also shows up in the Sedona CI: https://github.com/apache/sedona/actions/runs/5603748794/jobs/10250817504 (section "run tests"). I don't get an error either when I run the sparklyr tests,...
We are now getting this issue even on mutates and other verbs in the Sedona tests. dbGetQuery doesn't generate this, so I'm pretty sure the warning comes from some operation...
Some more research: It seems to come from the now "by" in dplyr. The warning gets generated by a call to this function in tidyselect `tidyselect_data_proxy` dbplyr has a `tidyselect_data_proxy.tbl_lazy`...
Found it! It's the call to tidyselect_data_proxy.tbl_spark and subsequent call to simulate_vars_spark it's here: https://github.com/sparklyr/sparklyr/blob/22aa571c1cb2820b916fff9e0860647c2ea024f5/R/utils.R#L475 Submitting a PR
The "hard" part is setting up EMR with R, a few resources: * https://spark.rstudio.com/deployment/yarn-cluster-emr * https://aws.amazon.com/blogs/big-data/running-sparklyr-rstudios-r-interface-to-spark-on-amazon-emr/ Then you just need to make sure the cluster has the sedona jars. One...
linewidth should be the parameter you're looking for
Any chance you're still looking at this? the package is using deprecated functions from rlang
Hey, When you install it using `devtools::install_github('catboost/catboost', subdir = 'catboost/R-package')`, the package gets the latest R code from github, but does not compile the C/C++ functions, it downloads the precompiled...
@ckiefer I've checked, the 0.24.2 release for Linux (here: https://github.com/catboost/catboost/releases/download/v0.24.2/catboost-R-Linux-0.24.2.tgz) does not have calls to the new functions (e.g. `catboost.get_plain_params` in `catboost.train`). Are you sure your installation worked? Try running...