mpich icon indicating copy to clipboard operation
mpich copied to clipboard

coll: enhance allgatherv performance on large systems

Open mpichbot opened this issue 9 years ago • 3 comments

Originally by balaji on 2009-11-03 13:28:56 -0600


One of the users reported a performance problem for Allgatherv for non-power-of-2 number of processes on large systems. On reading through the code, it looks like our algorithms rely on the "total message size" gathered, rather than the message size contributed by each process. So, for large-scale systems, even small message sizes go through an O(P) algorithm.

We should take another look at the irregular collectives to improve these cases.

mpichbot avatar Oct 14 '16 16:10 mpichbot

So, instead of checking the total size being gathered, should we use average msg size per rank as the determining factor for algorithm selection? That may also not work in some cases, for example, for a small average size but very large system size, you may still want to do ring algorithm? @pavanbalaji.

akhillanger avatar Dec 15 '17 15:12 akhillanger

Can you create a separate PR for new algorithm selections? That's orthogonal to this PR.

pavanbalaji avatar Dec 15 '17 21:12 pavanbalaji

This isn't a PR.

wesbland avatar Dec 15 '17 21:12 wesbland