Jim Garlick
Jim Garlick
The team at LLNL kindly installed openmpi-4.1.2 on our opal CTS-1 system for us, and it seems to work out of the box with flux. However it is using the...
> Sorry, tried which in our environment? I haven't tried wiping out the system module-set OMPI_* variables in our environment, but if that's what you mean, can give it a...
Thanks. To clarify, 4.0.5 just now, 4.1.1 before? By chance do you have 4.1.2 available? That is the only version I have that works on this system right now. (Self-built...
Excellent. Well we should root out the psm2 issue, but it's good to know _something_ works with a high speed interconnect!
Not making a lot of progress here, although just to add some data points, I got 4.1.1 working on our system with psm2 built, and it seems to work out...
Confirmed this issue on corona (TOSS 4) ``` ƒ(s=2,d=1) [garlick@corona282:mpi-test]$ module list Currently Loaded Modules: 1) intel-tce/19.0.4 2) StdEnv (S) 3) mvapich2-tce/2.3.6 ƒ(s=2,d=1) [garlick@corona282:mpi-test]$ flux mini run -N2 ./hello [corona282:mpi_rank_0][smpi_load_hwloc_topology]...
One note: this issue reproduces on corona but not on fluke.
This was resolved by configuring mvapich with the option `--enable-llnl-site-specific-options` (if you can believe it), which, like setting `MV2_ENABLE_AFFINITY=0`, disables affinity.
I guess I'm arguing that we just display the actual states not the "virtual" states. The actual states are NEW, DEPEND, PRIORITY, SCHED, RUN, CLEANUP, INACTIVE. https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_21.html (Edit: sorry I...
My job that failed to start was listed as R in the ST column. That's just weird. I vote we expose all of the primary states in the default output...