Running Flux under Spindle may need a re-investigation
@vchuravy attempted to run Flux under Spindle, but ran into some errors:
bash-4.2$ srun -N ${SLURM_NNODES} -n ${SLURM_NNODES} --pty --mpi=none --mpibind=off flux start -- bash
bash-4.2$ exit
bash-4.2$ spindle srun -N ${SLURM_NNODES} -n ${SLURM_NNODES} --pty --mpi=none --mpibind=off flux start -- bash
... hangs
srun: error: quartz16: task 0: Exited with exit code 255
srun: error: quartz19: task 3: Exited with exit code 255
srun: error: quartz18: task 2: Exited with exit code 255
srun: error: quartz17: task 1: Exited with exit code 255
And
bash-4.2$ spindle --slurm --no-mpi srun -N ${SLURM_NNODES} -n ${SLURM_NNODES} --pty --mpi=none --mpibind=off flux start -- bash
2021-05-01T19:11:27.405720Z broker.err[1]: rc1.0: ERROR: ld.so: object '/var/tmp/churavy1/spindle.72914/0-_usr_tce_packages_spindle_spindle_lib_spindle_libspindle_audit_pipe.so' cannot be loaded as audit interface: cannot open shared object file; ignored.
This is a lower priority issue for Valentin, but I'll add to my todo list to give it a run myself since we'll need this support eventually when Flux is the RM on LC systems. We might need to add certain flags like -a no and/or --slurm based on #1514. Once we have the right incantation, we can add it to our docs.
Does /var/tmp/churavy1/spindle.72914/0-_usr_tce_packages_spindle_spindle_lib_spindle_libspindle_audit_pipe.so exist?
Not sure if this is a Flux issue or Spindle issue from the error message. It looks as if 0-_usr_tce_packages_spindle_spindle_lib_spindle_libspindle_audit_pipe.so couldn't be loaded into the LD_AUDIT that Spindle is using... What OS is this? Could there be an interface change in LD_AUDIT?
What OS is this? Could there be an interface change in
This was on quartz with the spindle module in lmod.
What I wanted in particular was to run Jobspec submitted to the broker with spindle. I am not that worried about starting flux with spindle, but I suspect I need to have that so that spindle can catch the exec of the jobspec.
What I wanted in particular was to run Jobspec submitted to the broker with spindle. I am not that worried about starting flux with spindle, but I suspect I need to have that so that spindle can catch the exec of the jobspec.
Yeah since spindle works at the slurm level, this is necessary. Ideally, Spindle can be directly integrated with flux so that it only distributes the shared objects for the exec of the jobspec. But I don't think that work is not there yet. But even then, when flux nests, Spindle will have to deal with the same situation of needing to relocate flux shared objects so it would be good to fix the problem with the current mode.
@dongahn as we discussed at SC, it would be great to have a solid Spindle/Flux integration
Tagging @jameshcorbett for now and I will discuss this with @mplegendre and @jameshcorbett when I get back in town.
Spindle integration with Flux is a pending PR here: https://github.com/hpc/Spindle/pull/50. Closing this issue.