bug: datatype performance tests failing
Originally by goodell on 2013-01-25 16:43:08 -0600
The following datatype performance tests in the MPICH test suite are currently failing:
- perf/twovec
- perf/nestvec
- perf/nestvec2
- perf/indexperf
If --enable-fast is disabled, the following test also fail sometimes:
- perf/transp-datatype
I think Bill has some code pending that resolves some or all of these issues, so I'm assigning this to him for now.
Originally by gropp on 2013-01-26 08:47:16 -0600
I do have code for most of these which has been awaiting review by the original datatype/dataloop code author, Rob Ross. I also have a student looking at a more comprehensive solution, and at this point, it may make more sense to wait for that. I have not moved my code to the new CMS and it will probably need to be redeveloped in any event.
These are important tests - they were drawn from what were serious performance failures in code used in applications, and were documented in a published paper.
Originally by goodell on 2013-01-26 10:00:38 -0600
No objections to what you're saying and no blame here. I just wanted to make sure that we did not forget them and that I had a ticket to stick in the new xfail= description in the test suite.
Originally by robl on 2013-07-23 14:39:37 -0500
I think we have to take Rob Ross off the critical path for this. Can anyone else review the changes?
Originally by robl on 2014-01-14 15:11:18 -0600
Bill sent an old version of the patch to the list, and I wrangled it into the tree. Code lives at mpich-review/1788-optimized-dataloop now. First jenkin's report testing these changes was https://jenkins.mpich.org/job/mpich-review/566/testReport/
Bill says, in part
I expect indexperf to fail (it requires more work).
The others I'll need to look into;
Originally by Rob Latham [email protected] on 2014-01-16 21:39:50 -0600
In 38ef5818a883568e7dcc80f0e2aa0cfc972469be: a partial round of datatype optimizations
Some datatype performance tests in the MPICH test suite fail: (perf/twovec, perf/nestvec, perf/nestvec2, perf/indexperf, perf/transp-datatype).
This changeset introduces a few optimizations that operate on the dataloop representation to make it more performant. perf/indexperf should still fail under these changes.
Original-author: Bill Gropp [email protected]
See #1788, for which this resolves some but not all performance issues.
Signed-off-by: Rob Latham [email protected]
Originally by robl on 2014-01-30 12:49:22 -0600
Attachment added: ForMPICH.tgz (111.3 KiB)
halo exchange testcase from Daniel Kokron NASA Ames (ARC-TN) SciCon group
Originally by gropp on 2014-02-19 15:42:33 -0600
The halo exchange testcase makes use of non-standard Fortran compilation features. I can't build it on my Mac because it makes invalid assumptions about the availability of a CPP-style Fortran preprocessor.
Originally by gropp on 2014-02-19 15:47:41 -0600
See mpich-review/1788-optimized-dataloop-feb19 for the proposed fix. This passes the MPICH-2, MPICH-1, and Intel tests (specifically, the tests that fail, fail without this patch).
With ch4 (yaksa),
twovec - OK
nestvec -
MPI_Pack time = 1.611800e-05, manual pack time = 3.259000e-06
MPI_Pack time should be less than 4 times the manual time
For most informative results, be sure to compile this test with optimization
MPI_Pack with opt = 1.969900e-05, manual pack time = 3.259000e-06
MPI_Pack time should be less than 4 times the manual time
For most informative results, be sure to compile this test with optimization
Found 2 errors
nestvec2 -
Abort(583118092) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Pack: Invalid argument, error stack:
PMPI_Pack(143): MPI_Pack(inbuf=(nil), incount=1, dtype=USER<struct>, outbuf=0x7fab387ff010, outsize=1024008, position=0x7ffc4b83f17c, MPI_COMM_WORLD) failed
PMPI_Pack(100): Null pointer in parameter inbuf
indexedpack -
Abort(784444684) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Pack: Invalid argument, error stack:
PMPI_Pack(143): MPI_Pack(inbuf=(nil), incount=1, dtype=USER<struct>, outbuf=0x55fce71cdc10, outsize=16112, position=0x7ffe53b149c0, MPI_COMM_WORLD) failed
PMPI_Pack(100): Null pointer in parameter inbuf
We need somehow get these tests into at least nightly Jenkins tests.