Joseph Schuchart
Joseph Schuchart
I'm seeing similar errors with osc/rdma in the ARMCI test suite. Interestingly, I only see them on shared memory and not when running with one process per node. Will have...
@hjelmn Did you find anything useful?
Funny thing: I cannot reproduce this issue with the test case provided at the top. Will have to use the ARMCI tests...
@benmenadue I see a similar problem with windows allocated through `MPI_Win_allocated` and filed https://github.com/open-mpi/ompi/issues/10521. In short, using hcoll causes a problem on the machine I'm using and leads osc/rdma to...
@jotabf I think I can reproduce a hang in `MPI_Get` with osc/rdma and btl/openib. I have not had any success in debugging this though. However, support for openib has been...
> I still have a problem in my code using osc/ucx with some unexpected synchronization. Do you have the details for that? Could you open a ticket (if there isn't...
I'm planning to investigate https://github.com/open-mpi/ompi/issues/9062 soon and will also look at the general performance of coll/adapt and coll/han on an IB system soon. Will report back once I have the...
Sorry for the delay but here are some measurements for HAN on Hawk. I measured several different configurations, including 8, 24, 48, and 64 processes per node. I also ran...
I am able to reproduce this issue on a single node when running with `--mca btl self,tcp` and `THREAD_COUNT` set to 2. I cannot reproduce it with either `THREAD_COUNT` set...
With Open MPI built in debug mode, I get an assertion every couple of runs: ``` test_mpi_thread_multiple: ../../../../../opal/mca/btl/tcp/btl_tcp_endpoint.c:517: mca_btl_tcp_endpoint_accept: Assertion `btl_endpoint->endpoint_sd_next == -1' failed. [yoga:298124] *** Process received signal ***...