Make slurm-web scalable up to 10K jobs
There are several design issues that prevent slurm-web from scaling up to 10K jobs, both in the dashboard and the API. We must:
- identify them
- solve them
- profit
One scalability issue is the fact that pyslurm extract jobs with DETAILS|DETAILS2 flags by default, which notably make slurmctld reads all submission scripts in the spool dir which is a fail when there are 10k jobs and the spool dir is a shared filesystem (in slurmctld HA mode).
This issue concerns Slurm-web v2 which is not maintained anymore. You are highly encouraged to test the new version v3.0.0 that is not impacted by this issue. The quick start guide for v3.0.0 is available online: https://docs.rackslab.io/slurm-web/install/quickstart.html
Unless someone is motivated to maintain the old version of Slurm-web or you have a justified reason to keep this issue open, it will be closed in a few weeks.
For the reasons explained in the previous comment, I finally close this issue.