Ryan Day issues

Results 13 issues of


                                            Ryan Day

tracking job spec for jobs submitted with flux mini commands

Most (all?) resource management software allows users to include the parameters for their job allocations in their job scripts using a pragma syntax (e.g. #SBATCH, #BSUB, etc). This effectively allows...

use cases for partition + qos or queues

Here are the basic things that we do with the combination of partitions and qos in Slurm, and just queues in LSF: 1. set default limits on all user jobs...

filter 'flux jobs' output by Node ID

Related to #4185, it would be useful to be able to filter the output of `flux jobs` by node ID. I.e. to be able to easily answer questions like who...

-o gpu-affinity=per-task choosing 'wrong' gpus on tioga

see also https://rzlc.llnl.gov/jira/browse/ELCAP-179 The short version of this, I think, is that cpu-affinity and gpu-affinity assign the lowest numbered CPUs and lowest numbered GPUs to the lowest numbered tasks, but...

how to distinguish between jobs that start an instance and those that don't in a shell plugin

We would our mpibind shell plugin to run when users launch a flux job that runs a user's code, but not when they just start a new instance. I.e. we'd...

set ROCR_VISIBLE_DEVICES and HIP_VISIBILE_DEVICES in gpu-affinity shell plugin

The gpu-affinity plugin sets CUDA_VISIBLE_DEVICES, which governs which Nvidia GPUs are seen, but not the AMD GPU equivalent ROCR_VISIBLE_DEVICES or the hipcc equivalent HIP_VISIBLE_DEVICES. I'm not quite clear on whether...

issue with adding PyTorch ROCm libs to LD_LIBRARY_PATH

Holger Jones just reported an issue when he tries to run with flux after putting PyTorch ROCm libs in his LD_LIBRARY_PATH: Hey Ryan: just an fyi that I can break...

-o gpu-affinity=per-task choosing 'wrong' gpus on tioga

improperly sorted hostlist in 'flux resource list' output

the hostlist for allocated nodes in the `flux resource list` output doesn't appear to have been properly sorted and compacted. E.g. on corona: ``` [day36@corona212:~]$ flux resource list | grep...

flux resource list output confusing with multiple queues

The queue column of `flux resource list` can be confusing and hard to read when there are multiple overlapping queues. e.g. ``` [day36@rzadams1001:~]$ flux resource list STATE QUEUE NNODES NCORES...