callr
callr copied to clipboard
Trouble calling future within r_session$new() but not r_bg()
This question is related to https://github.com/HenrikBengtsson/future/discussions/607, but I am not sure if the solution lies with my usage of future
or my usage of callr
.
Are there differences in environment variables etc. between r_session$new()
and r_bg()
? Is there anything I can do to make configure the R session/environment of the former be more like the latter.
I am developing a package that requires submitting a future
inside a separate local process, and because of a some implementation details, I would prefer to use r_session$new()
rather than r_bg()
. But when I try the former, I get an error:
fun <- function() {
plan <- future::tweak(
future.batchtools::batchtools_sge,
template = "sge.tmpl"
)
future::plan(plan) # Runs on my company's SGE cluster
future::future("x")
}
px <- callr::r_session$new()
px$call(func = fun, args = list())
out <- px$read()
cat(out$error$message)
#> callr subprocess failed: Fatal error occurred: 101.
#> Command 'qsub' produced exit code 1.
#> Output: 'Unable to run job: got no response from JSV script
#> "/opt/uge/uge-8.6.6/util/resources/jsv/verify_job".
#> Exiting.
This happens both in the RStudio IDE and in a terminal. In both cases, it works with callr::r_bg()
.
px <- callr::r_bg(func = fun, args = args)
px$get_result()
Here is my sge.tmpl
file:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -o <%= log.file %>
#$ -V
#$ -N <%= job.name %>
module load R/4.1.2
Rscript -e 'batchtools::doJobCollection("<%= uri %>")'
exit 0
By the way, for the package I mentioned, I plan to implement task queues for cloud workers, and I am gradually working up to the challenge through callr
and future
/future.batchtools
. @gaborcsardi, your task queue blog post from 2019 was extremely helpful, and motivated the design I am using for all the queues. I credit you in the NOTICE and comments, and I will add a note to the README as well.
Are there differences in environment variables etc. between
r_session$new()
andr_bg()
?
Maybe, but you can print/save the environment variables in both and compare.
Unfortunately it is challenging for me to reproduce this, so there isn't much I can do I am afraid.
That's okay, I understand.
@wlandau Do you think it is possible to create a self contained docker container that reproduces this?
I can’t promise I will have enough time, but it is a good idea.
I did figure out how to reproduce this without future
or batchtools
. With this jobs.sh
script:
#!/bin/bash
#$ -cwd
#$ -j y
#$ -V
#$ -N test
sleep 5
This code reproduces the JSV script error:
fun <- function() system2("qsub", "job.sh")
px <- callr::r_session$new()
px$call(func = fun, args = list())
out <- px$read()
out
cat(out$error$message)
#> callr subprocess failed: Fatal error occurred: 101.
#> Command 'qsub' produced exit code 1.
#> Output: 'Unable to run job: got no response from JSV script
#> "/opt/uge/uge-8.6.6/util/resources/jsv/verify_job".
#> Exiting.
and this job runs successfully.
fun <- function() system2("qsub", "job.sh")
px <- callr::r_bg(fun = fun)
out <- px$get_result()
out
I will see if my sys admin knows what I could do to troubleshoot.
That's a good step. But I suspect that I would still need to set up an SGE cluster to run qsub
.
I asked my sys admin about containerizing SGE, but unfortunately he did not seem to think that was feasible. We debugged for a while, and he plans to send strace
output to an SGE developer. I could share the trace with you by email or some other way that is not public-facing.
Also, I noticed that r_session$new()
runs R --no-readline --slave --no-save --no-restore
. I thought my issue might have something to do with command line flags, so I tried R --no-readline --slave --no-save --no-restore -e 'system2("qsub", "job.sh")'
, but the job submitted successfully.
FWIW there are some (old) dockerfiles with SGE, e.g. https://github.com/stevekm/docker-centos6-sge
There is also this old recipe, which probably does not work any more: https://gist.github.com/dan-blanchard/6586533
Sure, you can send the strace to me in email.
Confirmed fixed in https://github.com/r-lib/processx/commit/1082c9db2345b8dfa5c45eb88711a42a0b681ae7. Thanks @gaborcsardi!