ansible-slurm
ansible-slurm copied to clipboard
Need help creating a minimal non-trivial playbook
Hi everyone, I'm struggling with creating a playbook that would install both control and execution nodes. Whatever I do, I end up with multiple nodes where each of them is a singleton cluster with single node in it, and there's no interconnectivity between them.
Minimal playbook:
- name: install SLURM cluster
hosts: vm0
roles:
- role: galaxyproject.slurm
become: True
vars:
slurm_roles: ['exec', 'dbd', 'controller']
slurm_munge_key: munge.key
- name: SLURM execution hosts
roles:
- role: galaxyproject.slurm
become: True
hosts: vm1, vm2
vars:
slurm_munge_key: munge.key
slurm_roles: ['exec']
slurm_nodes:
- name: "vm[1-2]"
CoresPerSocket: 1
slurm_partitions:
- name: compute
Default: YES
MaxTime: UNLIMITED
Nodes: "vm[1-2]"
and the output would be:
~/github/slurm_local
❯ ansible -i local.yml all -a 'sinfo'
vm1 | CHANGED | rc=0 >>
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 idle localhost
vm2 | CHANGED | rc=0 >>
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 idle localhost
vm0 | CHANGED | rc=0 >>
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 idle localhost
which is not what I intended.
Could anyone help drafting a correct playbook for such a case?
Did you manage this please (if so, how?) ? I am in the same waters.
Nope, couldn't do it.
This is a late response and probably no longer relevant for you, but maybe it helps someone in the future.
A working config where the controller node is also an executor:
- name: Controller
hosts: vm01
vars:
slurm_roles: ["controller"]
roles:
- role: galaxyproject.slurm
become: True
- name: Nodes
hosts: vm01,vm02
roles:
- role: galaxyproject.slurm
become: True
vars:
slurm_roles: ["exec"]
slurm_config:
SelectType: select/cons_tres
SlurmctldHost: vm01
SlurmdLogFile: /var/log/slurm/slurmd.log
SlurmctldLogFile: /var/log/slurm/slurmctld.log
slurm_nodes:
- name: vm01
CPUs: 16
Boards: 1
SocketsPerBoard: 4
CoresPerSocket: 4
ThreadsPerCore: 1
RealMemory: 128740
State: UNKNOWN
- name: vm02
CPUs: 48
Boards: 1
SocketsPerBoard: 2
CoresPerSocket: 12
ThreadsPerCore: 2
RealMemory: 257324
State: UNKNOWN
slurm_partitions:
- name: debug
Default: YES
MaxTime: UNLIMITED
Nodes: ALL
OverSubscribe: YES
DefMemPerCPU: 1024
SelectTypeParameters: CR_Core_Memory
slurm_create_user: true
slurm_user:
comment: "Slurm Workload Manager"
gid: 888
group: slurm
home: "/var/lib/slurm"
name: slurm
shell: "/usr/sbin/nologin"
uid: 888
# Manually created key
slurm_munge_key: "munge.key"
@mark-gerarts thanks, that's amazing!