TORQUE trouble
I am having trouble running batchtools jobs on a local installation of TORQUE on Ubuntu 16.04. I think TORQUE is working because the following test.pbs produces the expected output.
#PBS -N test
#PBS -l nodes=1:ppn=1
#PBS -l walltime=0:01:00
cd $PBS_O_WORKDIR
touch done.txt
echo "done"
However, all my jobs hang in the E state. For example, the following R script waits indefinitely.
library("batchtools")
cf <- makeClusterFunctionsTORQUE("torque.tmpl")
reg <- makeRegistry(NA)
reg$cluster.functions <- cf
batchMap(fun = identity, x = 1:4)
submitJobs()
waitForJobs() # waits here indefinitely
reduceResultsList() # not reached
In my case, the console message of wait_for_jobs()
Waiting (S:4 R:4 D:0 E:0) [-------------------] 0% eta: ?s
does not match qstat, which shows jobs hanging in the E state.
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
98.localhost ...8d7bd98804b04 wlandau 00:00:00 E batch
99.localhost ...fcce12fedcace wlandau 00:00:00 E batch
100.localhost ...dc63017b37ac6 wlandau 00:00:00 E batch
101.localhost ...b060e52879b8e wlandau 00:00:00 E batch
I am using the @HenrikBengtsson's torque.tmpl from future.batchtools.
Related: see my Stack Overflow post here and HenrikBengtsson/future.batchtools#12.
Looks like the system is not set up properly. Can you submit and run jobs manually?
Pretty much. For jobs that do not depend on other jobs (as opposed to drake with the future-powered parallel backend), the following test.pbs script generates the correct output.
#PBS -N test
#PBS -l nodes=1:ppn=1
#PBS -l walltime=0:01:00
cd $PBS_O_WORKDIR
touch done.txt
echo "done"
Then the job hangs in the E state indefinitely.
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
46.localhost test wlandau 00:00:00 E batch
I was just using a simple qsub test.pbs.
So the manual job also gets stuch in the E state (E for exiting)? Then this is a configuration issue.
Seems about right, I just wish I knew what the right configuration was.