ansible-slurm-appliance
ansible-slurm-appliance copied to clipboard
A Slurm-based HPC workload management environment, driven by Ansible.
Ticket: https://stackhpc.atlassian.net/browse/DEV-855 OpenDistro is EOL. This PR: - [x] Replaces OpenDistro with OpenSearch. - [x] Updates `filebeat` to the [newest-supported version](https://opensearch.org/docs/2.1/clients/agents-and-ingestion-tools/index/#compatibility-matrices). - [x] Adds the [required version faking](https://opensearch.org/docs/2.1/clients/agents-and-ingestion-tools/index/) to enable...
The creation of a new `cloud_init` role to enable creating /etc/hosts at boot time was a misfeature; - The workflow was nasty: It required provisioning ports and creating an inventory...
NB this doesn't use a "fat" image as OOD package installs apache config which appears to break the GUI.
# Release Notes Define infrastructure using Ansible inventory variables instead of Terraform variables: - Infrastructure is defined by new inventory vars `cluster_*` and `node_*` in an `environments//inventory/cluster.yml` inventory file. See...
Test deployment: `/home/rocky/slurm-app-noetchosts`
now https://github.com/OSC/ood-ansible/releases/tag/v2.0.8 released with service state option.
Fails on [this](https://github.com/stackhpc/ansible-slurm-appliance/blob/main/ansible/roles/podman/tasks/config.yml#L69) which is first `podman` command. Output from same shell command: ``` # sudo -u podman podman system reset --force ERRO[0000] running `/bin/newuidmap 86262 0 1002 1 1...
- https://github.com/stackhpc/ansible-slurm-appliance/blob/main/environments/arcus/inventory/group_vars/all/basic_users.yml - https://github.com/stackhpc/ansible-slurm-appliance/blob/main/environments/arcus/inventory/group_vars/basic_users/overrides.yml
see `podman run` - e.g. `--log-opt max-size=10mb`
Ticket: https://stackhpc.atlassian.net/browse/DEV-1017 Looks like `cpu` and `cpufreq` are already in `environments/common/inventory/group_vars/all/prometheus.yml` though.