aws-eda-slurm-cluster Documentation corrections required on deploy-parallel-cluster documentation page

Documentation corrections required on deploy-parallel-cluster documentation page

Open gwolski opened this issue 1 year ago • 1 comments

On the page

https://aws-samples.github.io/aws-eda-slurm-cluster/deploy-parallel-cluster/

Three issues that I ran into:

The Create users_groups.json secion has a duplicate of the table used later in "Configure submission hosts to use the cluster". It doesn't belong here..
The Description for the Config Stack Output states that Command01SubmitterMountHeadNode will "adds it to /etc/fstab". It does not. The command just mounts the file system:

head_ip=head_node.<clusterName>.pcluster && sudo mkdir -p /opt/slurm/<clusterName> && sudo mount $head_ip:/opt/slurm /opt/slurm/<clusterName>

(I've replaced my clusterName with <clusterName>)

After I have run the ansible playbook, I tried to load the module as specified in "Run Your First Job". This did not work:

$ module load <clusterName> ERROR: Unable to locate a modulefile for '<clusterName>'

I had to logout and log back in to get my environment set correctly to allow the module to be loaded and do its magic.

Apr 13 '24 00:04 gwolski

For item 2 above, I have found that running the ansible playbook adds the mount to my /etc/fstab. So you just have the comment in the wrong Description section...

Apr 13 '24 03:04 gwolski

The tables aren't duplicates. Only the first item is the same. However, the name of the command is misleading. So I renamed it from Command01_SubmitterMountHeadNode to Command01_MountHeadNodeNfs.

I updated the description for the 1st and 2nd commands in both tables. The /etc/fstab update occurs in the 2nd step when the ansible playbook is run.

When the new modulefile is created you need to create a new shell to refresh the environment or source the shell config again.

May 15 '24 18:05 cartalla

aws-eda-slurm-cluster aws-eda-slurm-cluster copied to clipboard

Documentation corrections required on deploy-parallel-cluster documentation page

aws-eda-slurm-cluster
aws-eda-slurm-cluster copied to clipboard