William Zhang

Results 34 comments of William Zhang

IIRC, HAN has more collectives implemented. We are going to do some performance testing on HAN/Adapt/Tuned.

@awlauria, your graphs don't have x,y labels. Is your y the same as @janjust 's y scale (ie. anything above 0 is han/adapt performing better, and anything below is worse)?

So it looks like HAN/Adapt outperform Tuned except at the largest message sizes, at least for your 6 node tests.

I think this was hit in our MTT, I can also give it a shot to reproduce

Those errors look like ucx errors, which wouldn't appear if the ofi mtl was properly selected. EFA is only supported with the ofi mtl. IIRC, specifying --mca pml cm doesn't...

EDIT: My recollection was wrong, specifying --mca pml cm should force the cm pml to be selected or an error will be thrown if it cannot select an MTL. I...

``` [c-96-4-worker0002:03606] mtl_ofi_component.c:541: select_ofi_provider: no provider found [c-96-4-worker0002:03606] select: init returned failure for component ofi [c-96-4-worker0002:03606] select: no component selected ``` This is the important section. The fi_getinfo call in...

> I did not see any information from libfabric printed, which makes me wonder whether openmpi was compiled correctly with libfabric. > > How did you obtain open mpi? >...

What's the status on this PR, are we agreed that this is the correct fix for #9869 ?

Since it looks like we're not moving forward with this change, I'm closing this PR. Please re-open if necessary.