adflow Menter SST turbulence model outputs NaNs

Description

Working on wind turbine blades with @marcomangano, I tried to use other turbulence models than the Spalart-Allmaras model which I had been using until now. I know that only sa is differentiated, but I just want to understand the sensitivity of the results to the turbulence model, from an analysis perspective. So I went for Menter's k-\omega SST model (see my options below), and I got NaNs and a PETSc error as off iteration 1 (see the log).

I also tried k omega wilcox and k omega modified, using "turbresscale": [1.0e3,1.0e-6]. There is no NaN in that case, however I found it suspicious that the two turbulence residuals displayed for monitoring are exactly equal to 0.0. Looks like it is the same at iteration 0 with Menter's model.

Less important: I noted a typo in the doc: the option to specify the rescaling parameters for turbulence models is turbresscale instead of turbresscalar.

Looking forward to your insights on what could be causing this.

Options

{'adjointl2convergence': 1e-09,
 'adjointmaxiter': 500000,
 'adjointsubspacesize': 500,
 'adpc': True,
 'anklinresmax': 0.1,
 'ankmaxiter': 60,
 'anksecondordswitchtol': 0.01,
 'cfl': 1.5,
 'cflcoarse': 1.5,
 'equationtype': 'RANS',
 'gridfile': 'UAE_1blade_short_3pitch_coarse_5_L1_3D_128.cgns',
 'l2convergence': 1e-11,
 'l2convergencecoarse': 1e-05,
 'lowspeedpreconditioner': True,
 'mgcycle': 'sg',
 'monitorvariables': ['cpu', 'resrho', 'cd', 'resturb', 'yplus'],
 'ncycles': 30000,
 'ncyclescoarse': 250,
 'nkinnerpreconits': 2,
 'nkjacobianlag': 3,
 'nkouterpreconits': 3,
 'nksubspacesize': 100,
 'nkswitchtol': 1e-14,
 'nsubiterturb': 10,
 'restrictionrelaxation': 1.0,
 'smoother': 'dadi',
 'surfacevariables': ['cp', 'mach', 'yplus', 'sepsensor', 'p', 'temp'],
 'turbresscale': [1000.0, 1e-06],
 'turbulencemodel': 'menter sst',
 'useanksolver': True,
 'useblockettes': False,
 'usenksolver': True,
 'useqcr': True,
 'userotationsa': True,
 'volumevariables': ['resrho', 'resturb', 'vort', 'mach']}

Log

#
# Grid 1: Performing 30000 iterations, unless converged earlier. Minimum required iteration before NK switch:      5. Switch to NK at totalR of:   0.43E-07
#
#---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
#  Grid  | Iter | Iter |  Iter  |   CFL   | Step | Lin  |    Wall    |        Res rho         |       Res kturb        |       Res wturb        |        C_drag          |        totalRes        |         Y+_max         |
#  level |      | Tot  |  Type  |         |      | Res  | Clock (s)  |                        |                        |                        |                        |                        |                        |
#---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      1       0      0     None   0.00E+00  1.00   ----  0.23527E+00   0.4008312643096706E+03   0.0000000000000000E+00   0.0000000000000000E+00   0.6116832261115712E-01   0.4296967046625193E+07   0.4770154773966238E+01
 Bad Block:  -58.457080994455325                            NaN                       NaN                       NaN                       NaN  -136.43358162455107                            NaN                       NaN                       NaN                       NaN  -27.057667980828771                            NaN                       NaN                       NaN                       NaN  -21.493860824869575                            NaN                       NaN                       NaN                       NaN   0.0000000000000000                            NaN                       NaN                       NaN                       NaN
 irow:     1963028
 icol     1802688
 nn:           1
 ijk:           6           6           2
 ---------------------------------------------------------------------------
PETSc or MPI Error. Error Code  1. Detected on Proc 51
Error at line:   630 in file: ../adjoint/adjointUtils.F90
 ---------------------------------------------------------------------------
 Bad Block:  -27.121688365402417                            NaN                       NaN                       NaN                       NaN -0.32959771301234786                            NaN                       NaN                       NaN                       NaN   12.760238156574124                            NaN                       NaN                       NaN                       NaN   3.1770896941151587                            NaN                       NaN                       NaN                       NaN   0.0000000000000000                            NaN                       NaN                       NaN                       NaN
 irow:     1320948
 icol      849908
 nn:           1
 ijk:           6           6           1

...

Code versions

ADflow v2.2.1
Python 3.7.2

Dec 12 '20 01:12 DGCaprace

Thanks for opening this issue! To clarify, @DGCaprace and I do not expect anybody to prioritize and work on this issue right now, but I think it is important to add this to the list of current issues for future reference. We are not currently testing anything that is not sa, so there would be some additional work to have everything running properly.

Also, thanks for catching the typo in the docs. We are in the process of updating the way we document the options, so that would be fixed soon.

Dec 12 '20 10:12 marcomangano

I have come to a better understanding of what happens here.
It seems that the NaNs appear in the computation of the preconditioner of the (A)NK solver. Iteration 0 performs correctly, and I also noticed in some cases that the smoother (DADI here) works for some iterations. When we reach the conditions to enter ANK, here is the stack trace that leads to the "bad blocks":

in ANKStep: https://github.com/mdolab/adflow/blob/master/src/NKSolver/NKSolvers.F90#L3697 in FormJacobianANK: https://github.com/mdolab/adflow/blob/master/src/NKSolver/NKSolvers.F90#L1943 in setupStateResidualMatrix: https://github.com/mdolab/adflow/blob/master/src/adjoint/adjointUtils.F90#L264 in setFDReference: https://github.com/mdolab/adflow/blob/master/src/adjoint/adjointUtils.F90#L1945 in block_res_state: https://github.com/mdolab/adflow/blob/master/src/adjoint/masterRoutines.F90#L1154 in blockResCore: https://github.com/mdolab/adflow/blob/master/src/NKSolver/blockette.F90#L851

The NaN is then detected when filling the blocks at https://github.com/mdolab/adflow/blob/master/src/adjoint/adjointUtils.F90#L498

I checked that there is no NaN before entering the if at: https://github.com/mdolab/adflow/blob/master/src/NKSolver/NKSolvers.F90#L3663

From my tests, I am certain that there are NaNs in dw and fw after the call to sumDwAndFw_block. I could not yet determine if that comes from the computation of the fluxes earlier in blockResCore, or if some memory blocks contained already NaNs before the call to blockResCore. I got a bit confused because the variables w,fw,dw exist both in module blockette and module blockPointers.

Anyway, while I am continuing to investigate, does this ring a bell to anyone? Any suggestion to where I should focus?

Jan 17 '21 03:01 DGCaprace

Can you try setting these three options to False?

'lowspeedpreconditioner': True,
'useqcr': True,
'userotationsa': True,

You should not need the low speed preconditioner with the ANK solver and the other two are for the SA model only.

Jan 17 '21 04:01 sseraj

Thanks @sseraj for the quick reply. Among the tests I've run, I've come back to the tutorial wing based on the test_solve.pyscript in the regression test set. This runs without all these options 3 options, with the same result. Here is the full set of options:

{'adjointdivtol': 100000.0,
 'adjointl2convergence': 1e-14,
 'adjointl2convergenceabs': 1e-16,
 'adjointl2convergencerel': 1e-16,
 'adjointmaxiter': 500,
 'adjointmonitorstep': 10,
 'adjointsolver': 'gmres',
 'adjointsubspacesize': 100,
 'adpc': False,
 'agmglevels': 1,
 'agmgnsmooth': 3,
 'alphafollowing': True,
 'alphamode': False,
 'altitudemode': False,
 'ankadpc': False,
 'ankasmoverlap': 1,
 'ankcfl0': 5.0,
 'ankcflcutback': 0.5,
 'ankcflexponent': 0.5,
 'ankcflfactor': 10.0,
 'ankcfllimit': 100000.0,
 'ankcflmin': 1.0,
 'ankconstcflstep': 0.4,
 'ankcoupledswitchtol': 1e-16,
 'ankinnerpreconits': 1,
 'ankjacobianlag': 20,
 'anklinearsolvetol': 0.5,
 'anklinresmax': 0.9,
 'ankmaxiter': 40,
 'anknsubiterturb': 1,
 'ankouterpreconits': 1,
 'ankpcilufill': 1,
 'ankpcupdatetol': 0.5,
 'ankphysicallstol': 0.2,
 'ankphysicallstolturb': 0.99,
 'anksecondordswitchtol': 1e-16,
 'ankstepfactor': 1.0,
 'ankstepmin': 0.01,
 'anksubspacesize': 5,
 'ankswitchtol': 0.01,
 'ankturbcflscale': 1.0,
 'ankturbkspdebug': False,
 'ankunsteadylstol': 1.0,
 'ankusefullvisc': True,
 'ankusematrixfree': True,
 'ankuseturbdadi': True,
 'applyadjointpcsubspacesize': 20,
 'applypcsubspacesize': 10,
 'applypcsubspacesize': 10,
 'approxpc': True,
 'asmoverlap': 1,
 'autoadjointretry': False,
 'autosolveretry': False,
 'backgroundvolscale': 1.0,
 'betamode': False,
 'blocksplitting': False,
 'cavitationnumber': 1.4,
 'cfl': 1.5,
 'cflcoarse': 1.25,
 'cfllimit': 1.5,
 'closedsurfacefamilies': None,
 'coarsediscretization': 'central plus scalar dissipation',
 'computecavitation': False,
 'coupledsolution': False,
 'cutcallback': None,
 'debugzipper': False,
 'deltat': 0.01,
 'designsurfacefamily': None,
 'discretization': 'central plus scalar dissipation',
 'dissipationlumpingparameter': 6.0,
 'dissipationscalingexponent': 0.67,
 'eddyvisinfratio': 0.009,
 'equationmode': 'steady',
 'equationtype': 'RANS',
 'eulerwalltreatment': 'linear pressure extrapolation',
 'firstrun': True,
 'flowtype': 'external',
 'forcesastractions': True,
 'frozenturbulence': False,
 'globalpreconditioner': 'additive schwartz',
 'gridfile': '/zhome/dcaprace/ATLANTIS/soft/adflow-GCC8-OMPI3_1/tests/reg_tests/../../_tutorial_rans_scalar_jst.cgns',
 'gridprecision': 'double',
 'gridprecisionsurface': 'single',
 'ilufill': 2,
 'infchangecorrection': False,
 'innerpreconits': 1,
 'isosurface': {},
 'isovariables': [],
 'l2convergence': 1e-14,
 'l2convergencecoarse': 0.0001,
 'l2convergencerel': 1e-16,
 'liftindex': 2,
 'limiter': 'vanalbeda',
 'loadbalanceiter': 10,
 'loadimbalance': 0.1,
 'localpreconditioner': 'ilu',
 'lowspeedpreconditioner': False,
 'machmode': False,
 'matrixordering': 'rcm',
 'maxl2deviationfactor': 1.0,
 'meshsurfacefamily': None,
 'mgcycle': 'sg',
 'mgstartlevel': -1,
 'monitorvariables': ['cpu',
                      'resrho',
                      'resturb',
                      'cl',
                      'cd',
                      'cmz',
                      'yplus',
                      'totalr'],
 'ncycles': 1000,
 'ncyclescoarse': 100,
 'nearwalldist': 0.1,
 'nkadpc': False,
 'nkasmoverlap': 1,
 'nkcfl0': 1000000000000.0,
 'nkfixedstep': 0.25,
 'nkinnerpreconits': 1,
 'nkjacobianlag': 20,
 'nklinearsolvetol': 0.3,
 'nkls': 'cubic',
 'nkouterpreconits': 1,
 'nkpcilufill': 2,
 'nksubspacesize': 60,
 'nkswitchtol': 0.001,
 'nkuseew': True,
 'nkviscpc': False,
 'nrefine': 10,
 'nrkreset': 5,
 'nsavesurface': 1,
 'nsavevolume': 1,
 'nsubiter': 3,
 'nsubiterturb': 3,
 'ntimestepscoarse': 48,
 'ntimestepsfine': 400,
 'numbersolutions': True,
 'outerpreconits': 3,
 'outputdirectory': '/zhome/dcaprace/ATLANTIS/soft/adflow-GCC8-OMPI3_1/tests/reg_tests/../
 'outputsurfacefamily': 'allSurfaces',
 'overlapfactor': 0.9,
 'oversetloadbalance': True,
 'oversetpriority': {},
 'oversetprojtol': 1e-12,
 'oversetupdatemode': 'frozen',
 'partitionlikenproc': -1,
 'partitiononly': False,
 'pmode': False,
 'preconditionerside': 'right',
 'printiterations': True,
 'printtiming': True,
 'printwarnings': True,
 'qmode': False,
 'resaveraging': 'noresaveraging',
 'restartadjoint': True,
 'restartfile': None,
 'restrictionrelaxation': 0.8,
 'rkreset': False,
 'rmode': False,
 'selfzipcutoff': 120.0,
 'sepsensoroffset': 0.0,
 'sepsensorsharpness': 10.0,
 'setmonitor': True,
 'skipafterfailedadjoint': True,
 'smoother': 'dadi',
 'smoothparameter': 1.5,
 'solutionprecision': 'double',
 'solutionprecisionsurface': 'single',
 'storerindlayer': True,
 'surfacevariables': ['cp', 'vx', 'vy', 'vz', 'mach'],
 'timeaccuracy': 2,
 'timeintegrationscheme': 'bdf',
 'timeintervals': 1,
 'timelimit': -1.0,
 'tsstability': False,
 'turbresscale': [1000.0,1e-6],
 'turbulencemodel': 'menter sst',
 'turbulenceorder': 'first order',
 'turbulenceproduction': 'strain',
 'useale': True,
 'useanksolver': False,
 'useapproxwalldistance': True,
 'useblockettes': False,
 'usediagtspc': True,
 'useft2sa': True,
 'usegridmotion': False,
 'uselinresmonitor': False,
 'usematrixfreedrdw': True,
 'usenksolver': True,
 'useoversetwallscaling': False,
 'useqcr': False,
 'userotationsa': False,
 'usewallfunctions': False,
 'usezippermesh': True,
 'verifyextra': True,
 'verifyspatial': True,
 'verifystate': True,
 'vis2': 0.25,
 'vis2coarse': 0.5,
 'vis4': 0.0156,
 'viscoussurfacevelocities': True,
 'viscpc': False,
 'viscwalltreatment': 'constant pressure extrapolation',
 'volumevariables': [],
 'walldistcutoff': 1e+20,
 'windaxis': False,
 'writesurfacesolution': True,
 'writetecplotsurfacesolution': False,
 'writevolumesolution': True,
 'zippersurfacefamily': None}

Jan 17 '21 06:01 DGCaprace

Thanks for all the efforts towards fixing this. It seems like we only compute residual for the SA in the blockResCore: https://github.com/mdolab/adflow/blob/master/src/NKSolver/blockette.F90#L812-L815 this would explain the SST residuals being exactly zero. @DGCaprace Can you add the call to SST_block in that select case block? I think just the call to sst_block will be enough, see an example implementation in turbSolveDDADI https://github.com/mdolab/adflow/blob/master/src/turbulence/turbAPI.F90#L65-L83

I am not sure if this will fix the NaN issue directly, but this definitely needs to be fixed for SST to work with any of the ANK-NK solvers.

Also, can you retry with 'lowspeedpreconditioner':False at some point in the future? I believe the default is False, and I dont think it is tested well enough to be used with a new turbulence model.

@marcomangano do your runs use this option? @DGCaprace is there a particular reason you enabled this option?

Jan 17 '21 09:01 anilyil

@anilyil I think we might have briefly talked about 'lowspeedpreconditioner' when we discussed the option setup, but ended up not having a strong opinion about that. I tweaked a bit the options in the last months but am still using @Aagaard87 script as reference, he might have played around with that. Anyway, I will try and run a few cases with that option turned off and let you guys know.

Jan 17 '21 10:01 marcomangano

The SST won't work until the modifications I explained above are made. But if you run with SA w/ and w/o lowspeedpreconditioner, that can help us figure out if that has any effect with the current work.

You can also wait for these runs until we fix the SST stuff. Then we can get back to checking that if we still have issues.

Jan 17 '21 10:01 anilyil

I tested the lowspeedpreconditioner option locally on coarse meshes (Mads' L3-L2) and I only notice a minimum difference in wall time, but the number of iterations to converge is the same. Maybe it is useful on bigger cases, but for now @DGCaprace I think we are safe switching that off.

Jan 17 '21 11:01 marcomangano

Thank you all for the comments.

I noticed the absence of a call to other turbulence models in blockResCore: for the preconditioner, the routine is called in frozen turbulence mode so I also believe it's not gonna save us from the NaN, but it definitely has something to do with the residuals =0. I quickly tried to add SST_block, but I have to double check: it seems that we might need to call f1SST for Menter's model. I'll keep working on that.

Thanks @marcomangano for the test of the low speed preconditioner. Good to know that it does not significantly affect the convergence with ANK. I'll keep that switched off, but again, it has no influence of the NaN either.

Jan 17 '21 19:01 DGCaprace

I've made the changes to compute the residuals for the other turbulence models in the blockResCore, as you suggested @anilyil, plus a couple other places where it seemed to be missing. There was also a problem in the computation of the eddy viscosity for the Menter SST model, which was part of the explanation of the NaNs.

This obviously solves the problem of the turbulence residuals stuck at 0 when reaching the (A)NK solver. However, it appears to resolve the NaNs appearing with the Menter SST model as well. I spent a bit of time trying to understand why this happened specifically with that option (after I fixed the computation of the eddy viscosity). The only conclusion that I could make is that the NK solve outputs Infinity at https://github.com/mdolab/adflow/blob/master/src/NKSolver/NKSolvers.F90#L3799. I checked the norm of the matrices dRdwPre and dRdw, and the vector rVec, but they were ok.

From this point on, I think it's easier if I start a PR to discuss the fix. However, I don't seem to have the rights to push to a new branch in this repo. What do you recommend/how do you usually proceed?

Jan 21 '21 01:01 DGCaprace

Thanks for working on this! If you fork ADflow, you can submit a PR from a branch on your fork.

Jan 21 '21 01:01 sseraj

Thanks for the update @DGCaprace. I am not sure why the NaN would occur, but sounds like the fix helped. At first, I was suspecting un-initialized values of the SST working variables, but sounded like you were still getting a NaN after running with DADI/RK, and that way the values should have been initialized.

Regardless, we can discuss further like you said. You can open a PR from your branch to ADflow like @sseraj said. That way, we will also be able to push commits to your branch and we can fix this properly with you.

Jan 21 '21 10:01 anilyil