LULESH Negative FOM when Running Large MPI Counts

We are scaling LULESH to large numbers of nodes (around 2000 nodes) with 8 MPI ranks per node and a problem size of 90. The result is that FOMs given go negative.

Elapsed time         =      55.50 (s)
Grind time (us/z/c)  = 0.38064213 (per dom)  (-0.00015125329 overall)
FOM                  = -6611426.6 (z/s)

This probably shouldn't happen.

Mar 27 '19 16:03 nmhamster

@nmhamster are you using master? If so I'm not sure how this output makes sense since the overall number and the elapsed time are the same value being printed out in the current code. The FOM is calculated off these, but its not clear where the bug would come from.

Note there was a bug related to overflow in an older version of LULESH. If this is what you are hitting then I can help you fix this if you need to use that version for some reason.

If you are using master can you give me inputs you are running and the full output. I can try and recreate if I can get access to a big enough resource quickly or try and recreate on a smaller node count with the same global problem size.

Mar 27 '19 21:03 ikarlin

OK, I am using the 2.0.3 release. I changed the output code at the end and it corrects the FOM (but note - I didn't change the number of elements print out earlier which is also overflowing.

Fixed output example:

Total number of elements: -1838665592

To run other sizes, use -s <integer>.
To run a fixed number of iterations, use -i <integer>.
To run a more or less balanced region set, use -b <integer>.
To change the relative costs of regions, use -c <integer>.
To print out progress, use -p
To write an output file for VisIt, use -v
See help (-h) for more options

Run completed:
   Problem size        =  90
   MPI tasks           =  9261
   Iteration count     =  200
   Final Origin Energy = 2.026863e+11
   Testing Plane 0 of Energy Array on rank 0:
        MaxAbsDiff   = 4.196167e-05
        TotalAbsDiff = 2.186766e-04
        MaxRelDiff   = 1.140498e-10


Elapsed time         =     105.67 (s)
Grind time (us/z/c)  = 0.72475375 (per dom)  (7.8258693e-05 overall)
FOM                  =   12778133 (z/s)

Changed the grindTime2 calculation to the following:

   Real_t local_grid = nx*nx*nx;
   Real_t local_grid_ranks = local_grid * (Real_t) numRanks;

   Real_t grindTime1 = ((elapsed_time*1e6)/locDom.cycle())/(nx*nx*nx);
   Real_t grindTime2 = ((elapsed_time*1e6)/locDom.cycle())/(local_grid_ranks);

Mar 27 '19 21:03 nmhamster

I can submit a pull request as I get these fixed.

Mar 27 '19 21:03 nmhamster

@nmhamster these are fixed in master. Do you need a tagged release? If so I think the easiest solution would be for me to tag a new release since there are no open issues and you to move there.

I'm open to other ideas, but its not worth your time fixing this.

Mar 27 '19 22:03 ikarlin

@nmhamster after some thought I think a new tagged release is overdue. I need to do a bit of performance testing to confirm no significant regression, but otherwise the code should be fine.

Mar 28 '19 13:03 ikarlin