submitit icon indicating copy to clipboard operation
submitit copied to clipboard

[Custom python environment]

Open emilemathieu opened this issue 4 years ago • 11 comments

Hi all !

In the context of cluster computing, it is sometime necessary to have the jobs running with a local python environment and not the one from the central one (where the job has been launched).

Currently, the python executable path is automatically extracted with shlex.quote(sys.executable) in slurm.py before being used to generate the scratch file in _make_sbatch_string.

While this is a great default behaviour, we would argue with @MJHutchinson to be able to specify a custom python executable path to override the default one, as implemented in this commit. The key line is:

 f"srun --output {stdout} --error {stderr} --unbuffered {executable} {command}",

Then, it combination with hydra, it is sufficient to add executable:/data/localhost/not-backed-up/${env:USER}/utils/venv_projec_name/bin/python in the slurm config.

If that sounds like a good solution to you all, I'll push a PR, otherwise I'm happy to discuss the issue.

Additionally, it seems that it is now possible to execute commands before running srun cf hydra_submitit_launcher. Would be great to also have a teardown argument to execute commands after running srun.

Best, Emile

emilemathieu avatar Feb 20 '21 19:02 emilemathieu

Sounds like a reasonable feature. We should add an executable parameter to the Executor (alongside folder) I believe. It may be source of confusion since some things may work locally and fail in the remote env. We probably want to add the name of the env somewhere in the logs: https://github.com/facebookincubator/submitit/blob/ba139e712efe705a4e9cace8ad8540d3b46fbd37/submitit/core/submission.py#L39

gwenzek avatar Mar 04 '21 13:03 gwenzek

@MJHutchinson

emilemathieu avatar Mar 04 '21 15:03 emilemathieu

@gwenzek Logging the env seems like a good idea indeed. Concerning the executable I'm not sure to understand why going through the Executor would change?

emilemathieu avatar Mar 04 '21 15:03 emilemathieu

One additional problem is then running with singularity where executable should usually be something along the lines of:

singularity --nv exec my_image.sif /opt/python3.8/bin/python

Which will break when running shlex.quote() on the executable string. Singularity is the preferred way of building environments on all SLURM cluster i have worked with.

cgerum avatar Apr 30 '21 08:04 cgerum

Hi. I am also interested in running Singularity jobs on a Slurm cluster through submitit. Are there any examples lying around ?

Something like function = submitit.helpers.CommandFunction(["singularity", "--version"]) works fine. Should I be working along those lines if I want to submit a job array, e.g. four singularity --nv exec calls ?

ymoisan avatar Mar 28 '22 20:03 ymoisan

FWIW, even though a commandline tool for running slurm jobs is a non-goal, I found that dynamically editing a singularity command as a Python string template goes a long way. Something like

cmd_template = Template("singularity exec -B $OPERATIONALIZATION_WORKING_DIR/ --cleanenv --nv $sif_file ...")

can be .safe_substituted with shared template items first and then .substituted for the varying template items in a context batch.

ymoisan avatar Apr 07 '22 15:04 ymoisan

Hi, can you try the following branch: https://github.com/facebookincubator/submitit/tree/escape_all ?

In the SlurmExecutor you can now pass a "python" string argument that can be used to specify a specific python to use:

ex = submitit.SlurmExecutor(folder="log", python="singularity --nv exec my_image.sif /opt/python3.8/bin/python")

If this solve your issues I'll open a PR.

gwenzek avatar Apr 12 '22 08:04 gwenzek

Hi,

I just ran into this issue on our compute cluster with local python environments. I have tried the above branch and it solves my issue.

atong01 avatar Jun 01 '22 18:06 atong01

Hi @gwenzek, I also tried the branch and it works with my singularity image. It would be great to have it merged!

tileb1 avatar Jun 11 '22 18:06 tileb1

It would amazing to get this merged - we are using nix-portable and to correctly use it on the cluster, we need to point to a custom python env.

ankitkk avatar Jul 16 '22 16:07 ankitkk

@gwenzek Can we get your branch merged to main?

ankitkk avatar Aug 05 '22 14:08 ankitkk