submitit
submitit copied to clipboard
[Custom python environment]
Hi all !
In the context of cluster computing, it is sometime necessary to have the jobs running with a local python environment and not the one from the central one (where the job has been launched).
Currently, the python executable path is automatically extracted with shlex.quote(sys.executable) in slurm.py before being used to generate the scratch file in _make_sbatch_string.
While this is a great default behaviour, we would argue with @MJHutchinson to be able to specify a custom python executable path to override the default one, as implemented in this commit. The key line is:
f"srun --output {stdout} --error {stderr} --unbuffered {executable} {command}",
Then, it combination with hydra, it is sufficient to add executable:/data/localhost/not-backed-up/${env:USER}/utils/venv_projec_name/bin/python in the slurm config.
If that sounds like a good solution to you all, I'll push a PR, otherwise I'm happy to discuss the issue.
Additionally, it seems that it is now possible to execute commands before running srun cf hydra_submitit_launcher. Would be great to also have a teardown argument to execute commands after running srun.
Best, Emile
Sounds like a reasonable feature. We should add an executable parameter to the Executor (alongside folder) I believe.
It may be source of confusion since some things may work locally and fail in the remote env.
We probably want to add the name of the env somewhere in the logs: https://github.com/facebookincubator/submitit/blob/ba139e712efe705a4e9cace8ad8540d3b46fbd37/submitit/core/submission.py#L39
@MJHutchinson
@gwenzek Logging the env seems like a good idea indeed. Concerning the executable I'm not sure to understand why going through the Executor would change?
One additional problem is then running with singularity where executable should usually be something along the lines of:
singularity --nv exec my_image.sif /opt/python3.8/bin/python
Which will break when running shlex.quote() on the executable string. Singularity is the preferred way of building environments on all SLURM cluster i have worked with.
Hi. I am also interested in running Singularity jobs on a Slurm cluster through submitit. Are there any examples lying around ?
Something like function = submitit.helpers.CommandFunction(["singularity", "--version"]) works fine. Should I be working along those lines if I want to submit a job array, e.g. four singularity --nv exec calls ?
FWIW, even though a commandline tool for running slurm jobs is a non-goal, I found that dynamically editing a singularity command as a Python string template goes a long way. Something like
cmd_template = Template("singularity exec -B $OPERATIONALIZATION_WORKING_DIR/ --cleanenv --nv $sif_file ...")
can be .safe_substituted with shared template items first and then .substituted for the varying template items in a context batch.
Hi, can you try the following branch: https://github.com/facebookincubator/submitit/tree/escape_all ?
In the SlurmExecutor you can now pass a "python" string argument that can be used to specify a specific python to use:
ex = submitit.SlurmExecutor(folder="log", python="singularity --nv exec my_image.sif /opt/python3.8/bin/python")
If this solve your issues I'll open a PR.
Hi,
I just ran into this issue on our compute cluster with local python environments. I have tried the above branch and it solves my issue.
Hi @gwenzek, I also tried the branch and it works with my singularity image. It would be great to have it merged!
It would amazing to get this merged - we are using nix-portable and to correctly use it on the cluster, we need to point to a custom python env.
@gwenzek Can we get your branch merged to main?