serac
serac copied to clipboard
Fix usage of SLIC abort functionality
There is a large PR (https://github.com/LLNL/axom/pull/868) in Axom to fix how SLIC handles error states. Currently it is not guaranteed that you will get the error messages or not hang on exit. After this PR we need to create a new exit function that does not do any collective calls (SLIC flushing for example) and also calls MPI_Abort()
. This should be like axom::utilities::processAbort()
but with a call to the new SLIC function slic::outputLocalMessages()
since we are guaranteed to have SLIC, where Axom doesn't do that.
Also verify this works how I think it does:
SLIC_ERROR -> outputs error message -> registered SLIC abort function -> outputs local messages, non collectively -> doesn't hang all nodes
It apparently does not:
"This routine should not be used from within a signal handler." from https://www.mpich.org/static/docs/v3.1/www3/MPI_Abort.html
Might be something here:
https://www.mpich.org/static/docs/latest/www3/MPI_Comm_set_errhandler.html
Fixed in #778 and #751