Mark Grondona

Results 154 issues of Mark Grondona

The environment is not set up correctly for `python/t10001-resourcegraph.py` when the default `PYTHONPATH` doesn't include the `flux` module: ``` The following tests FAILED: 92 - python/t10001-resourcegraph.py (Failed) Errors while running...

On a couple clusters, the sysadmins added new nodes to the configuration that were not yet available, then restarted Flux. The core `resource` module tracked the nodes as both drained...

The Fluxion scheduler provides a `t_estimate` job annotation, which `flux jobs` displays by default in the generic `INFO` column for jobs in the SCHED state. This is very useful, but...

The Flux system instance needed to be restarted on tioga recently and there were two active jobs in CLEANUP state. This caused Fluxion to fail to restart with the following...

The qmanager config appears to allow separate queue parameters per queue (though it isn't at all clear if these are actual Flux queues or the internal, unused Fluxion queues), but...

flux-framework/rfc#402 proposes to remove the optional `attributes` section from Rv1. Fluxion currently uses this section to store the queue in `attributes.system.scheduler.queue`, though it isn't clear why this is needed since...

This is a tracking issue for handling locality aware scheduling of on-node resources (currently only GPUs I think) in Fluxion. This is a requirement for our Production Ready System Instance...

In a planning meeting, the idea of running with rv1 match format enabled in production was discussed as a stopgap solution for #991. However, the performance or other impact due...

AKA 'standby' qos / queue. Allow users to submit jobs that can be killed automatically by the system instance if another job needs the resources. Currently creating this as an...

I was running a test instance of size=1536 across 32 real nodes, each with 96 cores. When launching 128N/128p jobs there seemed to be some kind of performance issue in...