ClusterManagers.jl
ClusterManagers.jl copied to clipboard
Error launching workers: no such file or directory
I am running addprocs_sge even though my cluster is UGE. Not sure if this is the reason my code isn't working.
If I use an interactive node, I can do addprocs_sge
using ClusterManagers, Distributed
ClusterManagers.addprocs_sge(4; qsub_flags=`-l h_rt=24:00:00,h_data=4G,arch=intel\*`)
println("reached here!")
pmap(x->run(`hostname`),workers())
println("finished!")
julia> From worker 2: n7087
From worker 3: n6669
From worker 4: n6669
From worker 5: n7648
But if I put the above script in a julia file (get_distributed_nodes.jl) and try to qsub it, I get
Error launching workers
Base.IOError("could not spawn `qsub -N julia-25745 -wd /u/home/b/biona001 -terse -j y -R y -t 1-4 -V -l 'h_rt=24:00:00,h_data=4G,arch=intel*'`: no such file or directory (ENOENT)", -2)
reached here!
finished!
I also made this discourse issue but didn't get any response as of now....
Any tip is appreciated.
I contacted cluster admins and this is his response:
SGE's "qsub" command normally is followed by a file name (i.e. job script), to be submitted by the "qsub" command, unless the content of the script is sent to qsub by the (Linux) pipe mechanism.
Perhaps that gives a hint on what's wrong?
:shrug: I don't use this cluster manager, sorry
I have to do module load slurm before I invoke julia and ClusterManagers. If I forget, I get a similar error. Maybe that's your problem too?
I know this is old, but it looks like a similar problem has cropped up elsewhere.
I think it's just a confusing error message. The "no such file or directory" probably actually refers to qsub, rather than one of the arguments. That is, qsub is not on the PATH. That's why module load slurm (which presumably adds to the PATH appropriately) can solve this.
So seems like this isn't a ClusterManagers.jl issue? I'll close for now - feel free to reopen if I've misunderstood.