Mark Grondona

Results 533 comments of Mark Grondona

We may need to augment the gpubind shell plugin to use hwloc to assign GPUs to each task with `-o gpu-affinity=per-task`. Alternately, I wonder if mpibind would "just work" here?...

The scheduler inside of a `flux mini alloc` or `flux mini batch` should still be Fluxion whenever flux-sched is installed. You can check with `flux module list | grep sched`....

Transferred this issue to flux-sched since it is the thing assigning GPUs in this case.

Great! I wonder if we can write a shell plugin, activated by an `-o` option, to dump the topology and set the environment variable on behalf of users before launching...

One more question: This works for a single node, but if a job has multiple nodes I assume we'll need to fetch the topology for each node and load them...

> In this test case at least there's a mismatch between the hwloc reader and rv1exec which I think is causing the "failed to read target rank list" error. Ah,...

This issue came up again in the flux dev meeting yesterday, so just commenting here to revive the issue. Reading above, I think what we need is a way to...

> Really what would likely be most useful in the short term would be to ensure that those logical IDs are translated back to physical IDs in exec or the...

Open to any suggestions. Unfortunately, `flux resource list` currently organizes output by common sets of resources, it is not, for example, iterating through configured queues and printing resources that way....

> Seems like we'd want to qualify that with "runs until it doesn't get a fatal exception" or something like that? Ah, perhaps something to clarify with the user. I...