aws-eda-slurm-cluster icon indicating copy to clipboard operation
aws-eda-slurm-cluster copied to clipboard

[FEATURE] Install slurm utilities in NFS area so all machines can see it, no need to recompile for every machine.

Open gwolski opened this issue 1 year ago • 2 comments

I haven't studied the following completely by reviewing the code, so apologies in advance if it exists.

When I run the ansible playbook to install on my workstation machines so they can all submit to the HeadNode, slurm is being recompiled and installed locally every time.

Is there a way for me to just install the utilities on an NFS mounted area, say an NFS mounted /usr/local/slurm area, and then just reference that? Same for the config files that might be used to tell slurm where/who the HeadNode is?

Or is there some reason for this requirement?

gwolski avatar May 15 '24 18:05 gwolski

Let me test this, but it should be storing the compiled binaries on the Slurm head node's NFS export so that all instances can see it. So, it should only need to be compiled once per OS distribution and architecture.

Let me test and make sure that it is detecting that it has already been done.

cartalla avatar May 17 '24 18:05 cartalla

As noted, I haven't dug into this, but I do see that the slurm commands are on the mounted head_node..pcluster:/opt/slurm

I have only installed on one "user workstation" so it might be ok and doing the right thing.

gwolski avatar May 17 '24 19:05 gwolski

The slurm binaries are only compiles on the submitter if they haven't previously been compiled for the OS and architecture of the submitter. They are compiled locally and then installed at /opt/slurm/ClusterName/config/os/... which is on the cluster's head node.

If you run the configuration script again it will run the ansible playbook, but it won't recompile the binaries because they already exist.

cartalla avatar May 23 '24 23:05 cartalla