adflow BC returnFail bug

Description

Some of the BC routines such as the subsonic inflow and outflow have checks to make sure that the flow direction is consistent with the prescribed BC itself. If this is not satisfied, these routines raise errors. There are 2 main issues here:

Most of these routines such as subsonic outflow use a terminate call for this (See: https://github.com/mdolab/adflow/blob/master/src/bcdata/BCData.F90#L969). This stops the execution completely. However, we want these routines to use returnFail so that we catch the failure and register it as a failure in the flow solver. When this happens in an optimization, it is used to raise the fail flag and the optimizer can move on. With terminate, the job just exits, which is not the desired behavior.
This was patched only for subsonic inflow (See: https://github.com/mdolab/adflow/blob/master/src/bcdata/BCData.F90#L919-L922) where we use the preferred returnFail call. However, when running in parallel, the processors that do not have any partition of this BC face do not execute this code, and as a result, some processors raise the return flag while others run past these BC routines and hang elsewhere at a global communication operation. These BC routines have several layer of calls, and the ideal solution is to aggregate all of the failure flags while applying the BC routines, and communicate these across all procs to make sure everyone is on the same page. Currently, if this BC routine fails and some procs dont have any partition of this face, ADflow will just hang.

Current behavior

When some procs do not have a partitioning of the relevant BC routines that fail, the code hangs for subsonic inflow BC. For other BCs that fail, the execution is terminated which stops the entire run.

Expected behavior

The BC failures should be caught properly and communicated across processors so that we can gracefully fail and raise a fail flag to the optimizer.

I have a quick patch for this myself; I just commented out these returnFails because when the BCs for my cases fail like this, the mesh warping also fails so I dont need to rely on adflow alone for the fail flag. Furthermore, because I fixed my optimizations, I dont get any of these failures in my runs anymore so this is not an issue when the BC routines do not fail as expected. Ultimately, this stuff should be fixed properly (most likely by me after my defense).

Jun 27 '21 13:06 anilyil

I think my draft PR #224 should fix this issue. The solution is to remove the mpi barrier in the totalSubsonicInlet routine and add an allreduce call in the python layer when we set the data from the aero problem. We can't rely on the allreduce that already exists for error catching in the __call__ method because some procs that don't have BC data will attempt to update the geometry with a failed mesh. Hence, we add an additional allreduce in the _setAeroProblemData function after the BC's are updated in the Fortran layer.

Aug 06 '22 17:08 lamkina

This is resolved with PR #224 .

Nov 21 '22 17:11 lamkina

adflow adflow copied to clipboard

BC returnFail bug

Description

Current behavior

Expected behavior

adflow
adflow copied to clipboard