submitit icon indicating copy to clipboard operation
submitit copied to clipboard

Failed to launch: Invalid wckey specification

Open rskwesterman opened this issue 2 years ago • 1 comments

I am trying to get the DINO model to train using the run_with_submitit.py from the https://github.com/facebookresearch/dino

But run into the following error: sbatch: error: Batch job submission failed: Invalid wckey specification subprocess.CalledProcessError: Command '['sbatch', '/mydir/checkpoint/experiments/submission_file_611ca66d3a6a43f69bab82264c3d6afc.sh']' returned non-zero exit status 1. [...] submitit.core.utils.FailedJobError: sbatch: error: Batch job submission failed: Invalid wckey specification

However, removing the wckey requirements from the sbatch file and manually running with sbatch results in the following error: srun: error: task 0 launch failed: Slurmd could not connect IO srun: error: task 1 launch failed: Slurmd could not connect IO

Any insights or solutions regarding the resolution of this issue would be greatly appreciated.

rskwesterman avatar Dec 19 '23 16:12 rskwesterman

same problem, and set slurm_wckey='' didn't work.

yinkaaiwu avatar Oct 02 '24 12:10 yinkaaiwu