slurm topic
prometheus-slurm-exporter
Prometheus exporter for performance metrics from Slurm.
batchtools
Tools for computation on batch systems
batch-shipyard
Simplify HPC and Batch workloads on Azure
nextflow
A DSL for data-driven computational pipelines
omnia
An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.
elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
submitit
Python 3.8+ toolbox for submitting jobs to Slurm
torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
funnel
Funnel is a toolkit for distributed task execution via a simple, standard API.