flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Slurm agent

Open BerndDoser opened this issue 6 months ago • 4 comments

Motivation: Why do you think this is important?

Slurm is a widely used workload management system in many HPC (High-Performance Computing) compute clusters. It plays a vital role in efficiently allocating compute resources, running work on these allocated resources, and overseeing a queue of pending work.

Integrating Slurm with Flyte presents a strategic opportunity to greatly enhance the utilization of Flyte workflows on HPC clusters, providing many advantages.

Goal: What should the final outcome look like, ideally?

The aim is the implementation of a Flyte agent that facilitates task submission to HPC resources using the Slurm scheduler.

Typically, users interact with Slurm through its command-line interface (CLI). For instance, the sbatch command submits a job script for later execution. An optional Slurm daemon also offers a REST API for interacting with the Slurm system.

Describe alternatives you've considered

I don't know of anything comparable.

Propose: Link/Inline OR Additional context

I am available to offer support using Slurm and to test the Flyte agent. https://github.com/JBris/slurm-rest-api-docker can be used for testing the Slurm CLI and the Slurm REST API.

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

BerndDoser avatar Aug 05 '24 20:08 BerndDoser