Ada Böhm
Ada Böhm
Read variables like `SLURM_MEM_PER_NODE` or `SLURM_MEM_PER_CPU` to detect memory on a node. (And probably other SLURM variables for gpus/cpus indices?)
Related to #495
The current version prints something like: ``` Error: Received error: "Could not submit allocation: sbatch execution failed\n\nCaused by:\n Exit code: 1\n Stderr: sbatch: error: AssocMaxSubmitJobLimit\n sbatch: error: Batch job submission...
Adds command `cargo xtask` that allows to: ## Creating report ```bash cargo xtask report ``` creates HTML file with report containing snapshots that does not match ## Interactive test blessing:...
# This PR introduces ResourceRqId (and LocalResourceRqId). Overall idea is to replace ResourceRequestVariants (self-contained resource request description) by ResourceRqId (u32) that is an index into global table of resources (the...
* [x] Data objects * [ ] FS support for data objects * [x] Support for NUMA aware allocation for combination of cpus & gpus. * [x] Refactor gateway access...
Dashboard now shows utilization of all cpus. It would nice to show (optionally) only utilization of assigned cpus (or grayout the non-assigned cpus).
* Allow to specify a minimum number of running workers per queue * Allow to specify a time when HQ tries to allocate a new worker ahead for replacement running...
# RFC: Task Runtime Storage (TRS) ## 1. Abstract 📝 This RFC proposes the **Task Runtime Storage (TRS)** feature for HyperQueue. TRS provides a simple, persistent key-value store associated with...