Trilinos icon indicating copy to clipboard operation
Trilinos copied to clipboard

Teuchos: Add option to barrier/fence timers

Open jewatkins opened this issue 3 years ago • 13 comments

Question

@trilinos/teuchos

Teuchos timer regions can sometimes be misleading when an mpi rank has to wait for other ranks or CUDA_LAUNCH_BLOCKING is off and there's no fence to stop a rank from continuing. This is based on some runs I performed while looking at #6079. Is it possible to add an option to add barriers/fences within a timed region? There's probably multiple ways of implementing this and there's probably issues so it'd be good to get some discussion. Maybe something like this:

    Kokkos::fence();
    MPI_Barrier(comm);
    Kokkos::Timer timer;
    func();
    Kokkos::fence();
    double time = timer.seconds();

I could see something similar being done for the stacked timer start/stop and wrapping the fence/barrier with a run-time/compile-time option.

This is preventing us from turning off CUDA_LAUNCH_BLOCKING completely.

jewatkins avatar Sep 30 '21 18:09 jewatkins

If Kokkos profiling is enabled, Teuchos timers include Kokkos profile regions, which will fence.

csiefer2 avatar Sep 30 '21 19:09 csiefer2

Perfect, that might be all I need to stop using CUDA_LAUNCH_BLOCKING. Any ideas for misleading timers when an mpi rank has to wait for other ranks? It makes it difficult to see which region has issues.

jewatkins avatar Sep 30 '21 20:09 jewatkins

Also, is Kokkos profiling a runtime option?

jewatkins avatar Sep 30 '21 20:09 jewatkins

Any ideas for misleading timers when an mpi rank has to wait for other ranks? It makes it difficult to see which region has issues.

@jewatkins You could either look at min/max/avg, or if using stacked timers, the timer histogram. Would that be sufficient for your needs?

jhux2 avatar Sep 30 '21 20:09 jhux2

If there are load-imbalance among MPI processes, I was wondering how that will show up.. We've seen load imbalance in compute (e.g., local SuperLU with an irregular mesh), but because we did not have MPI_Barrier before solve, the imbalance could also show up in solve (e.g., with GMRES's orthogonalization)?

iyamazaki avatar Sep 30 '21 20:09 iyamazaki

@jhux2 that's usually what I do. If a particular timer region has an mpi barrier and I see the min is small but the average is large, than I assume (although probably not always true) that there's one or more ranks waiting for a rank to "catch up". As @iyamazaki mentioned, there might be a timer region with load imbalance but it's not clear which one it could be (or whether the imbalance is even captured in a timer region). I think a well place mpi barrier before the timer region at least rules out specific regions from being the issue. It would be good to have that as a runtime option instead of having to manually add the barrier and rebuild.

jewatkins avatar Sep 30 '21 21:09 jewatkins

Also, is Kokkos profiling a runtime option?

Yes. You just need to set the right environment variable to point to the Kokkos profiling library of your choice.

See: https://github.com/kokkos/kokkos-tools/wiki

csiefer2 avatar Oct 05 '21 15:10 csiefer2

@jewatkins Fwiw, MueLu has a runtime option to add barriers to its timers.

jhux2 avatar Oct 05 '21 16:10 jhux2

Also, is Kokkos profiling a runtime option?

Yes. You just need to set the right environment variable to point to the Kokkos profiling library of your choice.

See: https://github.com/kokkos/kokkos-tools/wiki

Ah okay, I was thinking of a runtime option to just turn on timer fencing without having to build/use a kokkos profiling tool that will not be used. Victor suggested this already. I think I'm 0/2 so far on making this a thing :(

jewatkins avatar Oct 05 '21 17:10 jewatkins

@jewatkins Fwiw, MueLu has a runtime option to add barriers to its timers.

Very nice, I'll have to try that. If it's in Teuchos, maybe we can avoid having to do that for all packages?

jewatkins avatar Oct 05 '21 17:10 jewatkins

@jewatkins - there are runtime functions you can call to set the profiler in kokkos to avoid having to export env variables. We do this in empire to set our own profiler tools. I can show you this after our meeting later today.

rppawlo avatar Oct 05 '21 17:10 rppawlo

There is also a Teuchos::SyncTimeMonitor. But I don't think there is a global option to swap that in for the regular time monitor.

cgcgcg avatar Oct 05 '21 18:10 cgcgcg

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] avatar Oct 08 '22 12:10 github-actions[bot]

This issue was closed due to inactivity for 395 days.

github-actions[bot] avatar Nov 09 '22 12:11 github-actions[bot]