mpich icon indicating copy to clipboard operation
mpich copied to clipboard

bug: datatype performance tests failing

Open mpichbot opened this issue 9 years ago • 10 comments

Originally by goodell on 2013-01-25 16:43:08 -0600


The following datatype performance tests in the MPICH test suite are currently failing:

  • perf/twovec
  • perf/nestvec
  • perf/nestvec2
  • perf/indexperf

If --enable-fast is disabled, the following test also fail sometimes:

  • perf/transp-datatype

I think Bill has some code pending that resolves some or all of these issues, so I'm assigning this to him for now.

mpichbot avatar Oct 14 '16 17:10 mpichbot

Originally by gropp on 2013-01-26 08:47:16 -0600


I do have code for most of these which has been awaiting review by the original datatype/dataloop code author, Rob Ross. I also have a student looking at a more comprehensive solution, and at this point, it may make more sense to wait for that. I have not moved my code to the new CMS and it will probably need to be redeveloped in any event.

These are important tests - they were drawn from what were serious performance failures in code used in applications, and were documented in a published paper.

mpichbot avatar Oct 14 '16 17:10 mpichbot

Originally by goodell on 2013-01-26 10:00:38 -0600


No objections to what you're saying and no blame here. I just wanted to make sure that we did not forget them and that I had a ticket to stick in the new xfail= description in the test suite.

mpichbot avatar Oct 14 '16 17:10 mpichbot

Originally by robl on 2013-07-23 14:39:37 -0500


I think we have to take Rob Ross off the critical path for this. Can anyone else review the changes?

mpichbot avatar Oct 14 '16 17:10 mpichbot

Originally by robl on 2014-01-14 15:11:18 -0600


Bill sent an old version of the patch to the list, and I wrangled it into the tree. Code lives at mpich-review/1788-optimized-dataloop now. First jenkin's report testing these changes was https://jenkins.mpich.org/job/mpich-review/566/testReport/

Bill says, in part

I expect indexperf to fail (it requires more work). 
The others I'll need to look into; 

mpichbot avatar Oct 14 '16 17:10 mpichbot

Originally by Rob Latham [email protected] on 2014-01-16 21:39:50 -0600


In 38ef5818a883568e7dcc80f0e2aa0cfc972469be: a partial round of datatype optimizations

Some datatype performance tests in the MPICH test suite fail: (perf/twovec, perf/nestvec, perf/nestvec2, perf/indexperf, perf/transp-datatype).

This changeset introduces a few optimizations that operate on the dataloop representation to make it more performant. perf/indexperf should still fail under these changes.

Original-author: Bill Gropp [email protected]

See #1788, for which this resolves some but not all performance issues.

Signed-off-by: Rob Latham [email protected]

mpichbot avatar Oct 14 '16 17:10 mpichbot

Originally by robl on 2014-01-30 12:49:22 -0600


Attachment added: ForMPICH.tgz (111.3 KiB) halo exchange testcase from Daniel Kokron NASA Ames (ARC-TN) SciCon group

mpichbot avatar Oct 14 '16 17:10 mpichbot

Originally by gropp on 2014-02-19 15:42:33 -0600


The halo exchange testcase makes use of non-standard Fortran compilation features. I can't build it on my Mac because it makes invalid assumptions about the availability of a CPP-style Fortran preprocessor.

mpichbot avatar Oct 14 '16 17:10 mpichbot

Originally by gropp on 2014-02-19 15:47:41 -0600


See mpich-review/1788-optimized-dataloop-feb19 for the proposed fix. This passes the MPICH-2, MPICH-1, and Intel tests (specifically, the tests that fail, fail without this patch).

mpichbot avatar Oct 14 '16 17:10 mpichbot

With ch4 (yaksa), twovec - OK nestvec -

MPI_Pack time = 1.611800e-05, manual pack time = 3.259000e-06
MPI_Pack time should be less than 4 times the manual time
For most informative results, be sure to compile this test with optimization
MPI_Pack with opt = 1.969900e-05, manual pack time = 3.259000e-06
MPI_Pack time should be less than 4 times the manual time
For most informative results, be sure to compile this test with optimization
 Found 2 errors

nestvec2 -

Abort(583118092) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Pack: Invalid argument, error stack:
PMPI_Pack(143): MPI_Pack(inbuf=(nil), incount=1, dtype=USER<struct>, outbuf=0x7fab387ff010, outsize=1024008, position=0x7ffc4b83f17c, MPI_COMM_WORLD) failed
PMPI_Pack(100): Null pointer in parameter inbuf

indexedpack -

Abort(784444684) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Pack: Invalid argument, error stack:
PMPI_Pack(143): MPI_Pack(inbuf=(nil), incount=1, dtype=USER<struct>, outbuf=0x55fce71cdc10, outsize=16112, position=0x7ffe53b149c0, MPI_COMM_WORLD) failed
PMPI_Pack(100): Null pointer in parameter inbuf

hzhou avatar Mar 13 '21 15:03 hzhou

We need somehow get these tests into at least nightly Jenkins tests.

hzhou avatar Mar 13 '21 15:03 hzhou