flux-broker: stdin is not a tty - can't run interactive shell
Hi, I am following the slides here but when I run srun -N2 -n2 --pty flux start the second process which doesn't get an interactive shell tells me the error above. I seem to be unable to find how to start a flux cluster inside a slurm job without it needing to be interactive. Maybe my center has Slurm setup differently but I believe a none interactive command should exist in any case, right?
I want to run a flux instance (hopefully the right terminology) outside and use the proxy command to connect all my slurm allocations to that outside instance.
Cheers, Alex
You can try srun -N2 -n2 flux start sleep inf to run sleep inf as the initial program of your Flux instance, then attempt to connect with flux proxy slurm:JOBID. (Note that depending on your site's Slurm configuration, -N -n2 may only give your Flux instance access to one core on each node.)
However, I'm not sure why Flux isn't getting a pty given you have used the srun --pty option. Does srun -n1 --pty vim work for you?
Thank you, I will try that.
However, I'm not sure why Flux isn't getting a pty given you have used the
srun --ptyoption. Doessrun -n1 --pty vimwork for you?
Yes, n1 works. If I am reading the documentation correctly --pty only gives an interactive shell to the first process. It is only node2 that fails.
--pty, --pty=<File Descriptor>
Execute task zero with pseudo terminal mode or using pseudo terminal specified by <File Descriptor>. Implicitly sets --unbuffered. Implicitly sets --error and --output to /dev/null for all tasks except task zero, which may cause those tasks to exit immediately (e.g. shells will typically exit immediately in that situation). This option applies to step allocations.
It is only node2 that fails.
Ah, that is a good clue. That probably indicates that for some reason on your Slurm cluster, Flux isn't properly bootstrapping via Slurm and each broker (of the two being started) thinks it is a singleton.
You might try forcing Slurm to use PMI2 by adding --mpi=pmi2 to the srun command line. Let us know if that works.
@grondo's suggestion is probably the right one, but here are some more docs if you need them:
https://flux-framework.readthedocs.io/projects/flux-core/en/latest/guide/start.html#starting-with-slurm
Ah, I think I might know what is going on. The cluster has some setup where after the allocation I am able to SSH into the nodes. This might be some environment magic and the reason why Flux thinks it is by itself.
For some added background, when launching Flux under Slurm, Flux is bootstrapping in the same way as an MPI job for its wire-up. Only the "rank 0" flux-broker will start the "initial program" (by default an interactive shell, otherwise any arguments you've provided to flux start). Therefore, if you can launch an MPI job, you should be able to launch a Flux instance under Slurm (with or without interactive ssh access)
After the Flux instance is established, though, you would need ssh access to make use of flux proxy.