Ryan Day
Ryan Day
Most (all?) resource management software allows users to include the parameters for their job allocations in their job scripts using a pragma syntax (e.g. #SBATCH, #BSUB, etc). This effectively allows...
Here are the basic things that we do with the combination of partitions and qos in Slurm, and just queues in LSF: 1. set default limits on all user jobs...
Related to #4185, it would be useful to be able to filter the output of `flux jobs` by node ID. I.e. to be able to easily answer questions like who...
see also https://rzlc.llnl.gov/jira/browse/ELCAP-179 The short version of this, I think, is that cpu-affinity and gpu-affinity assign the lowest numbered CPUs and lowest numbered GPUs to the lowest numbered tasks, but...
We would our mpibind shell plugin to run when users launch a flux job that runs a user's code, but not when they just start a new instance. I.e. we'd...
The gpu-affinity plugin sets CUDA_VISIBLE_DEVICES, which governs which Nvidia GPUs are seen, but not the AMD GPU equivalent ROCR_VISIBLE_DEVICES or the hipcc equivalent HIP_VISIBLE_DEVICES. I'm not quite clear on whether...
Holger Jones just reported an issue when he tries to run with flux after putting PyTorch ROCm libs in his LD_LIBRARY_PATH: Hey Ryan: just an fyi that I can break...
see also https://rzlc.llnl.gov/jira/browse/ELCAP-179 The short version of this, I think, is that cpu-affinity and gpu-affinity assign the lowest numbered CPUs and lowest numbered GPUs to the lowest numbered tasks, but...
the hostlist for allocated nodes in the `flux resource list` output doesn't appear to have been properly sorted and compacted. E.g. on corona: ``` [day36@corona212:~]$ flux resource list | grep...
The queue column of `flux resource list` can be confusing and hard to read when there are multiple overlapping queues. e.g. ``` [day36@rzadams1001:~]$ flux resource list STATE QUEUE NNODES NCORES...