aws-parallelcluster-post-install-scripts icon indicating copy to clipboard operation
aws-parallelcluster-post-install-scripts copied to clipboard

Pyxis runtime path cannot be on /fsx

Open verdimrc opened this issue 10 months ago • 1 comments

Pyxis runtime path cannot be /fsx, otherwise error to run Docker image (directly) on multiple nodes.

# NOTE: below works fine for -N1.
$ srun -N2 --container-image=alpine grep PRETTY /etc/os-release
...
slurmstepd: error: pyxis:     Can't find a SQUASHFS superblock on /fsx/pyxis/1000/385.0.squashfs
slurmstepd: error: pyxis:     Wrong filesystem or filesystem is corrupted!
slurmstepd: error: pyxis:     Failed to read existing filesystem - will not overwrite - ABORTING!
slurmstepd: error: pyxis:     To force Mksquashfs to write to this block device or file use -noappend
...
srun: error: p4de-st-p4de-1: task 0: Exited with exit code 1
...
slurmstepd: error: pyxis:     [ERROR] No such file or directory: /fsx/pyxis/1000/385.0.squashfs
...
srun: error: p4de-st-p4de-2: task 1: Exited with exit code 1

verdimrc avatar Apr 18 '24 04:04 verdimrc