Trilinos
Trilinos copied to clipboard
MueLu: SubFactoryMonitor and StackedTimers don't work well together
Bug Report
@trilinos/muelu
MueLu's SubFactoryMonitors don't appear to play well with StackedTimer reporting.
[EDIT] This is on Crusher (AMD). So a platform specific issue is possible.
Excerpt from a StackedTimer summary based on SubFactoryMonitors.
Notice that all the time is in the last Remainder
Notice that SFM don't nest correctly
| | | | MueLu: Ifpack2Smoother: Setup Smoother (total): 1.17603 - 99.9032% [1]
| | | | | MueLu: Ifpack2Smoother: Get matrix from current level (sub, total): 1.072e-06 - 9.11542e-05% [1]
| | | | | MueLu: Ifpack2Smoother: Call 'SetupChebyshev' (sub, total): 6.11e-07 - 5.19545e-05% [1]
| | | | | MueLu: Ifpack2Smoother: Cast matrix to Tpetra::RowMatrix (sub, total): 5.11e-07 - 4.34513e-05% [1]
| | | | | MueLu: Ifpack2Smoother: Estimate max eigenvalue (sub, total): 2.11e-07 - 1.79417e-05% [1]
| | | | | MueLu: Ifpack2Smoother: Get lumped diagonal (sub, total): 5.71e-07 - 4.85532e-05% [1]
| | | | | MueLu: Ifpack2Smoother: SetPrecParameters: 0.0138927 - 1.18133% [1]
| | | | | MueLu: Ifpack2Smoother: Preconditioner init (sub, total): 9.21e-07 - 7.83144e-05% [1]
| | | | | MueLu: Ifpack2Smoother: Preconditioner compute (sub, total): 3e-07 - 2.55096e-05% [1]
| | | | | Ifpack2::Chebyshev::compute: 0.0239956 - 2.04039% [1]
| | | | | | Ifpack2: powerMethodWithInitGuess: 0.0191033 - 79.6119% [1]
| | | | | | Remainder: 0.00489225 - 20.3881%
| | | | | MueLu: Ifpack2Smoother: Determine lambdaMax (sub, total): 7.21e-07 - 6.1308e-05% [1]
| | | | | MueLu: Ifpack2Smoother: toggle setup boolean (sub, total): 2.91e-07 - 2.47443e-05% [1]
| | | | | MueLu: Ifpack2Smoother: print description (sub, total): 3.01e-07 - 2.55946e-05% [1]
| | | | | Remainder: 1.13814 - 96.7778%
Excerpt from a StackedTimer summary, the same code as above, but replacing SubFactoryMonitors with raw Teuchos::Timers. Notice that the time is now attributed correctly, and the remainder is small. Notice that SFM are now nested correctly.
| | | | MueLu: Ifpack2Smoother: Setup Smoother (total): 1.20185 - 99.909% [1]
| | | | | Get matrix from current level: 3.938e-06 - 0.000327663% [1]
| | | | | get non-const ref to param list: 6.81e-07 - 5.66629e-05% [1]
| | | | | Calll "SetupChebyshev": 1.2018 - 99.9959% [1]
| | | | | | Cast matrix to Tpetra::RowMatrix: 7.11e-07 - 5.91615e-05% [1]
| | | | | | Estimate max eigenvalue: 1.1635 - 96.8137% [1]
| | | | | | | Get lumped diagonal: 1.16347 - 99.9969% [1]
| | | | | | | Remainder: 3.5739e-05 - 0.00307167%
| | | | | | MueLu: Ifpack2Smoother: SetPrecParameters: 0.0142344 - 1.18443% [1]
| | | | | | Preconditioner init: 3.697e-06 - 0.000307623% [1]
| | | | | | Preconditioner compute: 0.0239759 - 1.99501% [1]
| | | | | | | Ifpack2::Chebyshev::compute: 0.0239725 - 99.9861% [1]
| | | | | | | | Ifpack2: powerMethodWithInitGuess: 0.0191433 - 79.8551% [1]
| | | | | | | | Remainder: 0.00482924 - 20.1449%
| | | | | | | Remainder: 3.337e-06 - 0.0139182%
| | | | | | Determine lambdaMax: 4.8824e-05 - 0.00406259% [1]
| | | | | | Remainder: 2.8696e-05 - 0.00238776%
| | | | | toggle setup boolean: 8.11e-07 - 6.74796e-05% [1]
| | | | | print description: 2.2614e-05 - 0.00188161% [1]
| | | | | Remainder: 2.1712e-05 - 0.00180656%
Automatic mention of the @trilinos/muelu team
That's unfortunate. Does that mean we need to fix SubFactoryMonitors somehow, or is it more difficult than that?
Hopefully just SubFactoryMonitor itself.
@cgcgcg had some suggestions/questions:
- Is this a single-rank phenomenon? No, happens on multi-mpi-rank jobs also.
- Do the nightly performance runs show the same issue? No.
- Try running MueLu's driver from the performance build. Exhibits the same weird behavior.
Items 2) and 3) indicate this may be an environment issue.
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE
label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE
.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.
@lucbv Have you also seen this on Frontier?
@jhux2 I recently fixed something, maybe that was the reason? https://github.com/trilinos/Trilinos/pull/12753/commits/1f2cc316e4804138a9341518ca96f73e2c059b0a