Gilles Gouaillardet

Results 87 comments of Gilles Gouaillardet

iirc that is a know issue two hosts have different hardware and hence end up selecting different `osc` components. as a workaround, you can ``` mpirun --mca ^osc ... ```

`osc` is for one-sided communications (not in `IMB-MPI 1` and I do not think there is a `osc/rdma` component in `1.10` You can ``` mpirun --mca osc_base_verbose 10 ... ```...

I will double check that What happens when you blacklist the `osc/rdma` component ?

Is this the only log you get when the benchmark hangs ?

Can you collect the same traces with `IMB-EXT` and `1.10` ?

Any chance to test the latest master ? I cannot remember if we fixed that (and only a backport is needed) @hjelmn any recollection of this issue ?

That will be enough for now, thanks

it seems this has never been fixed, even on master. can you please give the inline patch a try ? this is really a proof of concept at this stage....

@hjelmn thanks for the comment, I will definitely use a single allreduce.

Here is a more correct patch __[EDIT]__ use `MPI_MIN` instead of `MPI_MAX` ```patch diff --git a/ompi/mca/osc/rdma/osc_rdma_component.c b/ompi/mca/osc/rdma/osc_rdma_component.c index b145395..069c9dc 100644 --- a/ompi/mca/osc/rdma/osc_rdma_component.c +++ b/ompi/mca/osc/rdma/osc_rdma_component.c @@ -372,6 +372,8 @@ static int...