coll: enhance allgatherv performance on large systems
Originally by balaji on 2009-11-03 13:28:56 -0600
One of the users reported a performance problem for Allgatherv for non-power-of-2 number of processes on large systems. On reading through the code, it looks like our algorithms rely on the "total message size" gathered, rather than the message size contributed by each process. So, for large-scale systems, even small message sizes go through an O(P) algorithm.
We should take another look at the irregular collectives to improve these cases.
So, instead of checking the total size being gathered, should we use average msg size per rank as the determining factor for algorithm selection? That may also not work in some cases, for example, for a small average size but very large system size, you may still want to do ring algorithm? @pavanbalaji.
Can you create a separate PR for new algorithm selections? That's orthogonal to this PR.
This isn't a PR.