adda icon indicating copy to clipboard operation
adda copied to clipboard

Optimize MPI operations

Open GoogleCodeExporter opened this issue 8 years ago • 5 comments

Several possible ways to optimize MPI part of the code: 1) use one buffer
for all MPI communications; 2) possibly use MPI_ALLTOALL and 'derived
datatypes' for block_transpose; 3) use 'persistent communication requests'
for repeated communications.

Original issue reported on code.google.com by yurkin on 28 Nov 2008 at 6:49

GoogleCodeExporter avatar Aug 12 '15 07:08 GoogleCodeExporter

Original comment by yurkin on 10 Jun 2011 at 2:03

  • Added labels: MPI

GoogleCodeExporter avatar Aug 12 '15 07:08 GoogleCodeExporter

Another large area for possible optimization is file i/o using standard MPI 
functions. This can also be used to address issue 90 and issue 31.

#90, #31

Original comment by yurkin on 9 Nov 2011 at 5:31

GoogleCodeExporter avatar Aug 12 '15 07:08 GoogleCodeExporter

Another interesting idea is "distributed arrays" data structures instead the 
manual distribution, as is done now. The advantage may be that MPI runtime will 
be aware of the exact exchanges to be performed inside a single node - they may 
be substantially optimized (sometimes even completely omitted). This may also 
address issue 137.

#137

Original comment by yurkin on 23 Nov 2011 at 4:09

GoogleCodeExporter avatar Aug 12 '15 07:08 GoogleCodeExporter

Probably relevant are the MPI parallelization efforts with DDSCAT, see Numrich RW, Clune TL, Kuo K-S. A new parallel version of the DDSCAT code for electromagnetic scattering from big targets. In: PIERS 2013 Taipei Proceedings; 2013 Mar. 25-28; Taipei, Taiwan. p. 722–726. http://piers.org/piersproceedings/download.php?file=cGllcnMyMDEzVGFpcGVpfDNBM18wNzIyLnBkZnwxMjEwMjAyMTQzMDQ=

In particular, it contains some scaling graphs. Claims very good parallel efficiency for fixed-size problems.

myurkin avatar Jan 15 '16 09:01 myurkin

/cc @cirrusUH

myurkin avatar Nov 02 '16 18:11 myurkin