MLLinOp::makeSubCommunicator() does not scale well to O(10k) ranks

Open maximumcats opened this issue 5 years ago • 2 comments

We discovered while running the Castro flame_wave problem on 2048 Summit nodes (6 ranks per node) that the average time per call to MLLinOp::makeSubCommunicator() was 0.06 seconds, which is twice as expensive as a hydro advance on GPUs at that scale (for comparison purposes).

Jun 07 '20 22:06 maximumcats

Note: #998 partially addressed the situation because I believe we observed a case in Castro where building the subcommunicator was unnecessary. However I am not sure if that addressed the original issue we observed on Summit.

May 16 '21 04:05 maximumcats

Maybe we should prebuild a number of subcommunicators with various number of processes. In the coarsened MG levels, we don't need to use a subcommunicator that is exactly the size of the DistributionMapping.

May 16 '21 15:05 WeiqunZhang