SU2 icon indicating copy to clipboard operation
SU2 copied to clipboard

A problem of running shape_optimization.py in parallel due to error '139'

Open HANSIeltsKing opened this issue 2 years ago • 1 comments

This is a SU2 user using Linux system. When simulating the a testcase of shape optimisation simulation in parallel, an error occured after the first set of DIRECT solution calculated, which is shown below:

-------------------------------------------------------------------------
|    ___ _   _ ___                                                      |
|   / __| | | |_  )   Release 7.3.0 "Blackbird"                         |
|   \__ \ |_| |/ /                                                      |
|   |___/\___//___|   Aerodynamic Shape Optimization Script             |
|                                                                       |
-------------------------------------------------------------------------
| SU2 Project Website: https://su2code.github.io                        |
|                                                                       |
| The SU2 Project is maintained by the SU2 Foundation                   |
| (http://su2foundation.org)                                            |
-------------------------------------------------------------------------
| Copyright 2012-2022, SU2 Contributors (cf. AUTHORS.md)                |
|                                                                       |
| SU2 is free software; you can redistribute it and/or                  |
| modify it under the terms of the GNU Lesser General Public            |
| License as published by the Free Software Foundation; either          |
| version 2.1 of the License, or (at your option) any later version.    |
|                                                                       |
| SU2 is distributed in the hope that it will be useful,                |
| but WITHOUT ANY WARRANTY; without even the implied warranty of        |
| MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU      |
| Lesser General Public License for more details.                       |
|                                                                       |
| You should have received a copy of the GNU Lesser General Public      |
| License along with SU2. If not, see <http://www.gnu.org/licenses/>.   |
-------------------------------------------------------------------------
Found: mesh_NACA64A010_turb.su2
New Project: ./
Sequential Least SQuares Programming (SLSQP) parameters:
Number of design variables: 50 ( 50 )
Objective function scaling factor: [1.0]
Maximum number of iterations: 100
Requested accuracy: 1.0000000000000001e-11
Initial guess for the independent variable(s): [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Lower and upper bound for each independent variable: [(-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05), (-0.05, 0.05)]

Traceback (most recent call last):
  File "/usr/local/bin/shape_optimization.py", line 183, in <module>
    main()
  File "/usr/local/bin/shape_optimization.py", line 92, in main
    shape_optimization( options.filename    ,
  File "/usr/local/bin/shape_optimization.py", line 159, in shape_optimization
    SU2.opt.SLSQP(project,x0,xb,its,accu)
  File "/usr/local/bin/SU2/opt/scipy_tools.py", line 120, in scipy_slsqp
    outputs = fmin_slsqp( x0             = x0             ,
  File "/home/xxx/.local/lib/python3.8/site-packages/scipy/optimize/_slsqp_py.py", line 206, in fmin_slsqp
    res = _minimize_slsqp(func, x0, args, jac=fprime, bounds=bounds,
  File "/home/xxx/.local/lib/python3.8/site-packages/scipy/optimize/_slsqp_py.py", line 329, in _minimize_slsqp
    mieq = sum(map(len, [atleast_1d(c['fun'](x, *c['args']))
  File "/home/xxx/.local/lib/python3.8/site-packages/scipy/optimize/_slsqp_py.py", line 329, in <listcomp>
    mieq = sum(map(len, [atleast_1d(c['fun'](x, *c['args']))
  File "/usr/local/bin/SU2/opt/scipy_tools.py", line 457, in con_cieq
    cons = project.con_cieq(x)
  File "/usr/local/bin/SU2/opt/project.py", line 257, in con_cieq
    return self._eval(konfig, func,dvs)
  File "/usr/local/bin/SU2/opt/project.py", line 206, in _eval
    vals = design._eval(func,*args)
  File "/usr/local/bin/SU2/eval/design.py", line 147, in _eval
    vals = eval_func(*inputs)
  File "/usr/local/bin/SU2/eval/design.py", line 520, in con_cieq
    func = su2func(this_con,config,state)
  File "/usr/local/bin/SU2/eval/functions.py", line 92, in function
    aerodynamics( config, state )
  File "/usr/local/bin/SU2/eval/functions.py", line 274, in aerodynamics
    info = su2run.direct(config)
  File "/usr/local/bin/SU2/run/direct.py", line 139, in direct
    su2merge(konfig)
  File "/usr/local/bin/SU2/run/merge.py", line 81, in merge
    merge_solution(konfig)
  File "/usr/local/bin/SU2/run/merge.py", line 107, in merge_solution
    SU2_SOL( config )
  File "/usr/local/bin/SU2/run/interface.py", line 207, in SOL
    run_command( the_Command )
  File "/usr/local/bin/SU2/run/interface.py", line 270, in run_command
    raise exception(message)
RuntimeError: Path = /home/xxx/SU2-7.3.0/xxxx/DESIGNS/DSN_001/DIRECT/,
Command = mpirun -n 4 /home/xxx/SU2-7.3.0/SU2/bin/SU2_SOL config_SOL.cfg
SU2 process returned error '139'
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node LAPTOP-DDQOFLU8 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

The parallel computing for the case of SU2_CFD only can work well. but I cannot find what the error '139' refers to.

I will be very appreciated if I could get some feedback on this issue. Thank you!

HANSIeltsKing avatar Jun 14 '22 18:06 HANSIeltsKing

It's possible that you hit a bug, or the adjoint solver diverged, or... who knows, the python scripts are not very friendly at reporting errors and they are also not very flexible. My most sincere advice is to follow what is done in this tutorial https://su2code.github.io/tutorials/Species_Transport/ it will save you time in the medium term, and at least you will know what the optimization is doing.

pcarruscag avatar Jun 20 '22 22:06 pcarruscag

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is still a relevant issue please comment on it to restart the discussion. Thank you for your contributions.

stale[bot] avatar Nov 02 '22 00:11 stale[bot]

I have same problem. In cluster, I can run normal analysis (not optimization) without any issue. When I try to run optimization on nodes of HPC, it gives same error at random stage. For example sometimes at DNS2 Defrom or DSN1 adjoint.

error.txt

Could you suggest any solution?

ardaozuzun avatar Jul 16 '23 11:07 ardaozuzun