ClusterManagers.jl icon indicating copy to clipboard operation
ClusterManagers.jl copied to clipboard

Error launching workers: no such file or directory

Open biona001 opened this issue 4 years ago • 3 comments

I am running addprocs_sge even though my cluster is UGE. Not sure if this is the reason my code isn't working.

If I use an interactive node, I can do addprocs_sge

using ClusterManagers, Distributed
ClusterManagers.addprocs_sge(4; qsub_flags=`-l h_rt=24:00:00,h_data=4G,arch=intel\*`)

println("reached here!")
pmap(x->run(`hostname`),workers())
println("finished!")

julia>       From worker 2:	n7087
      From worker 3:	n6669
      From worker 4:	n6669
      From worker 5:	n7648

But if I put the above script in a julia file (get_distributed_nodes.jl) and try to qsub it, I get

Error launching workers
Base.IOError("could not spawn `qsub -N julia-25745 -wd /u/home/b/biona001 -terse -j y -R y -t 1-4 -V -l 'h_rt=24:00:00,h_data=4G,arch=intel*'`: no such file or directory (ENOENT)", -2)
reached here!
finished!

I also made this discourse issue but didn't get any response as of now....

Any tip is appreciated.

biona001 avatar Jun 22 '21 19:06 biona001

I contacted cluster admins and this is his response:

SGE's "qsub" command normally is followed by a file name (i.e. job script), to be submitted by the "qsub" command, unless the content of the script is sent to qsub by the (Linux) pipe mechanism.

Perhaps that gives a hint on what's wrong?

biona001 avatar Jun 23 '21 19:06 biona001

:shrug: I don't use this cluster manager, sorry

kescobo avatar Jun 23 '21 20:06 kescobo

I have to do module load slurm before I invoke julia and ClusterManagers. If I forget, I get a similar error. Maybe that's your problem too?

maxfreu avatar Oct 19 '21 13:10 maxfreu

I know this is old, but it looks like a similar problem has cropped up elsewhere.

I think it's just a confusing error message. The "no such file or directory" probably actually refers to qsub, rather than one of the arguments. That is, qsub is not on the PATH. That's why module load slurm (which presumably adds to the PATH appropriately) can solve this.

moble avatar Feb 22 '23 05:02 moble

So seems like this isn't a ClusterManagers.jl issue? I'll close for now - feel free to reopen if I've misunderstood.

kescobo avatar Feb 26 '23 19:02 kescobo