adflow
adflow copied to clipboard
BC returnFail bug
Description
Some of the BC routines such as the subsonic inflow and outflow have checks to make sure that the flow direction is consistent with the prescribed BC itself. If this is not satisfied, these routines raise errors. There are 2 main issues here:
- Most of these routines such as subsonic outflow use a
terminate
call for this (See: https://github.com/mdolab/adflow/blob/master/src/bcdata/BCData.F90#L969). This stops the execution completely. However, we want these routines to usereturnFail
so that we catch the failure and register it as a failure in the flow solver. When this happens in an optimization, it is used to raise the fail flag and the optimizer can move on. With terminate, the job just exits, which is not the desired behavior. - This was patched only for subsonic inflow (See: https://github.com/mdolab/adflow/blob/master/src/bcdata/BCData.F90#L919-L922) where we use the preferred
returnFail
call. However, when running in parallel, the processors that do not have any partition of this BC face do not execute this code, and as a result, some processors raise the return flag while others run past these BC routines and hang elsewhere at a global communication operation. These BC routines have several layer of calls, and the ideal solution is to aggregate all of the failure flags while applying the BC routines, and communicate these across all procs to make sure everyone is on the same page. Currently, if this BC routine fails and some procs dont have any partition of this face, ADflow will just hang.
Current behavior
When some procs do not have a partitioning of the relevant BC routines that fail, the code hangs for subsonic inflow BC. For other BCs that fail, the execution is terminated which stops the entire run.
Expected behavior
The BC failures should be caught properly and communicated across processors so that we can gracefully fail and raise a fail flag to the optimizer.
I have a quick patch for this myself; I just commented out these returnFail
s because when the BCs for my cases fail like this, the mesh warping also fails so I dont need to rely on adflow alone for the fail flag. Furthermore, because I fixed my optimizations, I dont get any of these failures in my runs anymore so this is not an issue when the BC routines do not fail as expected. Ultimately, this stuff should be fixed properly (most likely by me after my defense).
I think my draft PR #224 should fix this issue. The solution is to remove the mpi barrier in the totalSubsonicInlet
routine and add an allreduce
call in the python layer when we set the data from the aero problem. We can't rely on the allreduce
that already exists for error catching in the __call__
method because some procs that don't have BC data will attempt to update the geometry with a failed mesh. Hence, we add an additional allreduce
in the _setAeroProblemData
function after the BC's are updated in the Fortran layer.
This is resolved with PR #224 .