ansible-role-openhpc icon indicating copy to clipboard operation
ansible-role-openhpc copied to clipboard

Unable to add extra slurm.conf items with the same key

Open tom91136 opened this issue 1 year ago • 3 comments

We're using this playbook to setup a Warewulf managed system so the compute nodes will not be available (images are not created yet so can't boot). The current play tolerates empty host groups but we can't inject computes nodes with known configurations because openhpc_config expects a dictionary:

      openhpc_config:
        NodeName: "compute1 CPU=1 State=UNKNOWN"
        NodeName: "compute0 CPU=1 State=UNKNOWN" # duplicate key

Looking at the slurm.conf syntax, I think using a list will be more appropriate here.

tom91136 avatar Jun 15 '24 02:06 tom91136

For backwards compatibility, we can check if openhpc_config is a string or list and append the content (one line per list item or the entire string as-is) at the end of the slurm.conf template for maximum flexibility.

tom91136 avatar Jun 15 '24 02:06 tom91136

Happy to open a PR if welcomed.

tom91136 avatar Jun 15 '24 02:06 tom91136

Don't really want to go down that route. The openhpc_config parameter is really for unique parameters, not for defining nodes/partitions or other things which occur multiple times.

Autotedection of node config is a pretty major feature of the role, so if you're using it in a way where that is not the case its probably better to replace the whole slurm.conf.j2 template. Will have a think about this for follow-on changes after #189

sjpb avatar May 15 '25 14:05 sjpb