Alan Williams
Alan Williams
I will try to reproduce the discrepancy on the local blade where I can run threaded and also use vtune for another view of where the time is being spent.
I think I have successfully reproduced this on my local blade with the cvfemHC nightly test. When I run with 8 mpi procs and 1 thread per mpi rank, the...
It looks like there is some double-counting of time somewhere. For the hoHelium case if I sum the times printed from dump_eq_time() on equation-system, the sum is greater than total...
There is some double-counting of time in the equation-system timers. An example is LowMachEquationSystem::solve_and_update calls momentumEqSys_-> compute_projected_nodal_gradient(), and time is accumulated both inside and outside that call. In general, I...
Replacing this with a newer snapshot.
We did recently start using the MPI_CXX_BOOL type in a couple of cases in stk. I'll refer this to our MPI guru and see what he thinks. I wonder if...
Sorry for not following up on this. This issue was fixed by this stk update: https://github.com/trilinos/Trilinos/pull/10914
@sayerhs That's great, I'm glad you were able to get the per-target build times. It looks like the ngp_algorithms files are the worst offenders. I think any compilation-unit with cuda...
@sayerhs One more thing: the stk ngp stuff could be a culprit, it is all header-only, and perhaps much of it doesn't need to be header-only. I'll look into that.
@sayerhs Just from browsing the code, I wonder if we could reduce build times by splitting NgpLoopUtils.h into several separate headers. For instance I see that many .C files call...