Sven Stroteich
Sven Stroteich
Hi, I also investigated this problem on two different machines, one using omnipath and the other using ethernet as network. The later one is slower by a factor of roughly...
I did not try the standard MPI_alltoall since the slowing down appears within the procedure "c_redist_35" and the "c_redist_35" which from my point of view does something a little bit...
UCX improved it a lot but the main problem is the size of the packages. A bunch of small packages is sent which is unefficient for most machines. I am...