flyte
flyte copied to clipboard
[Core feature] Slurm agent
Motivation: Why do you think this is important?
Slurm is a widely used workload management system in many HPC (High-Performance Computing) compute clusters. It plays a vital role in efficiently allocating compute resources, running work on these allocated resources, and overseeing a queue of pending work.
Integrating Slurm with Flyte presents a strategic opportunity to greatly enhance the utilization of Flyte workflows on HPC clusters, providing many advantages.
Goal: What should the final outcome look like, ideally?
The aim is the implementation of a Flyte agent that facilitates task submission to HPC resources using the Slurm scheduler.
Typically, users interact with Slurm through its command-line interface (CLI). For instance, the sbatch
command submits a job script for later execution. An optional Slurm daemon also offers a REST API for interacting with the Slurm system.
Describe alternatives you've considered
I don't know of anything comparable.
Propose: Link/Inline OR Additional context
I am available to offer support using Slurm and to test the Flyte agent. https://github.com/JBris/slurm-rest-api-docker can be used for testing the Slurm CLI and the Slurm REST API.
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes