ansible-slurm icon indicating copy to clipboard operation
ansible-slurm copied to clipboard

Need help creating a minimal non-trivial playbook

Open marinegor opened this issue 1 year ago • 4 comments

Hi everyone, I'm struggling with creating a playbook that would install both control and execution nodes. Whatever I do, I end up with multiple nodes where each of them is a singleton cluster with single node in it, and there's no interconnectivity between them.

Minimal playbook:

- name: install SLURM cluster
  hosts: vm0
  roles:
    - role: galaxyproject.slurm
      become: True
  vars:
    slurm_roles: ['exec', 'dbd', 'controller']
    slurm_munge_key: munge.key

- name: SLURM execution hosts
  roles:
    - role: galaxyproject.slurm
      become: True
  hosts: vm1, vm2
  vars:
    slurm_munge_key: munge.key
    slurm_roles: ['exec']
    slurm_nodes:
      - name: "vm[1-2]"
        CoresPerSocket: 1
    slurm_partitions:
      - name: compute
        Default: YES
        MaxTime: UNLIMITED
        Nodes: "vm[1-2]"

and the output would be:

~/github/slurm_local
❯ ansible -i local.yml all -a 'sinfo'
vm1 | CHANGED | rc=0 >>
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1   idle localhost
vm2 | CHANGED | rc=0 >>
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1   idle localhost
vm0 | CHANGED | rc=0 >>
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      1   idle localhost

which is not what I intended.

Could anyone help drafting a correct playbook for such a case?

marinegor avatar May 08 '23 21:05 marinegor

Did you manage this please (if so, how?) ? I am in the same waters.

jp-um avatar Oct 09 '23 10:10 jp-um

Nope, couldn't do it.

marinegor avatar Oct 09 '23 11:10 marinegor

This is a late response and probably no longer relevant for you, but maybe it helps someone in the future.

A working config where the controller node is also an executor:

- name: Controller
  hosts: vm01
  vars:
    slurm_roles: ["controller"]
  roles:
    - role: galaxyproject.slurm
      become: True

- name: Nodes
  hosts: vm01,vm02
  roles:
    - role: galaxyproject.slurm
      become: True
  vars:
    slurm_roles: ["exec"]
    slurm_config:
      SelectType: select/cons_tres
      SlurmctldHost: vm01
      SlurmdLogFile: /var/log/slurm/slurmd.log
      SlurmctldLogFile: /var/log/slurm/slurmctld.log
    slurm_nodes:
      - name: vm01
        CPUs: 16
        Boards: 1
        SocketsPerBoard: 4
        CoresPerSocket: 4
        ThreadsPerCore: 1
        RealMemory: 128740
        State: UNKNOWN
      - name: vm02
        CPUs: 48
        Boards: 1
        SocketsPerBoard: 2
        CoresPerSocket: 12
        ThreadsPerCore: 2
        RealMemory: 257324
        State: UNKNOWN
    slurm_partitions:
      - name: debug
        Default: YES
        MaxTime: UNLIMITED
        Nodes: ALL
        OverSubscribe: YES
        DefMemPerCPU: 1024
        SelectTypeParameters: CR_Core_Memory
    slurm_create_user: true
    slurm_user:
      comment: "Slurm Workload Manager"
      gid: 888
      group: slurm
      home: "/var/lib/slurm"
      name: slurm
      shell: "/usr/sbin/nologin"
      uid: 888
    # Manually created key
    slurm_munge_key: "munge.key"

mark-gerarts avatar Jul 25 '24 12:07 mark-gerarts

@mark-gerarts thanks, that's amazing!

marinegor avatar Jul 25 '24 20:07 marinegor