Failed to launch: Invalid wckey specification
I am trying to get the DINO model to train using the run_with_submitit.py from the https://github.com/facebookresearch/dino
But run into the following error:
sbatch: error: Batch job submission failed: Invalid wckey specification subprocess.CalledProcessError: Command '['sbatch', '/mydir/checkpoint/experiments/submission_file_611ca66d3a6a43f69bab82264c3d6afc.sh']' returned non-zero exit status 1.
[...]
submitit.core.utils.FailedJobError: sbatch: error: Batch job submission failed: Invalid wckey specification
However, removing the wckey requirements from the sbatch file and manually running with sbatch results in the following error:
srun: error: task 0 launch failed: Slurmd could not connect IO srun: error: task 1 launch failed: Slurmd could not connect IO
Any insights or solutions regarding the resolution of this issue would be greatly appreciated.
same problem, and set slurm_wckey='' didn't work.