Infinite loop in nksol.m subroutine model
I've been encountering a problem where certain cases "hang". The text output stops, the code continues running, but nothing else happens (for example, the intermediate hdf5 savefiles do not get updated). Here is an example of the output (I'm using rundt.py):
$ python2 runcase.py
Read file "gridue " with runid: FREEGS 09/01/2020 # 0 0ms
File attributes:
(' written on: ', 'Mon Apr 13 17:51:10 2020')
(' by code: ', 'UEDGE')
(' physics tag: ', array(['$Name: $'], dtype='|S80'))
UEDGE $Name: $
*** For isimpon=2, set afracs, not afrac ***
Read file "gridue " with runid: FREEGS 09/01/2020 # 0 0ms
Updating Jacobian, npe = 1
iter= 0 fnrm= 0.2571840422417168 nfe= 1
nksol --- iterm = 1.
maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
the maximum norm function. u is probably an
approximate root of f.
*---------------------------------------------------------*
Need to take initial step with Jacobian; trying to do here
*---------------------------------------------------------*
*** For isimpon=2, set afracs, not afrac ***
Read file "gridue " with runid: FREEGS 09/01/2020 # 0 0ms
Updating Jacobian, npe = 1
iter= 0 fnrm= 0.2571840422417166 nfe= 1
nksol --- iterm = 1.
maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
the maximum norm function. u is probably an
approximate root of f.
initial fnrm =2.5718E-01
--------------------------------------------------------------------
--------------------------------------------------------------------
*** Number time-step changes = 1 New time-step = 1.0000E-10
rundt time elapsed: 0:00:08
--------------------------------------------------------------------
*** For isimpon=2, set afracs, not afrac ***
iter= 0 fnrm= 0.2571840422417166 nfe= 1
(stays like this forever)
^Z
It is also possible to get the code to continue by compiling without the -Ofast flag in Makefile.Forthon. In this case, you start seeing fnrm= nan but the output continues until dtreal goes below dtkill and the code quits by itself.
I examined the case with -Ofast which hangs. I found that the pandf subroutine was being called many times. By adding some print statements and going up the call stack, I found this infinite loop inside subroutine model in nksol.m:
10 continue
ipcur = 0
write(STDOUT,*) 'sbb if pthrsh=',pthrsh,'gt onept5=',onept5
write(STDOUT,*) ' and ipflg=',ipflg,'ne 0'
if ( (pthrsh .gt. onept5) .and. (ipflg .ne. 0) ) then
ier = 0
write(STDOUT,*) 'sbb call psetnk'
call pset (n, u, savf, su, sf, x, f, wm(locwmp), iwm(locimp),
* ier)
npe = npe + 1
ipcur = 1
nnipset = nni
if (ier .ne. 0) then
iersl = 8
return
endif
endif
c-----------------------------------------------------------------------
c load x with -f(u).
c-----------------------------------------------------------------------
do 100 i = 1,n
100 x(i) = -savf(i)
c-----------------------------------------------------------------------
c call solpk to solve j*x = -f using the appropriate krylov
c algorithm.
c-----------------------------------------------------------------------
call solpk (n,wm,lenwm,iwm,leniwm,u,savf,x,su,sf,f,jac,psol)
write(STDOUT,*) 'sbb after solpk iersl=', iersl,'ipflg=',ipflg,
* 'ipcur=',ipcur
if (iersl .lt. 0) then
c nonrecoverable error from psol. set iersl and return.
iersl = 9
return
endif
if ( (iersl .gt. 0) .and. (ipflg .ne. 0) ) then
if (ipcur .eq. 0) go to 10
endif
I also put print statements around pandf1 calls in oderhs.m. The pandf1 calls followed by line numbers (messed up by the additional lines I inserted into the file) correspond to other pandf1 calls in the jac_calc subroutine.
c ... Beginning of execution for call rhsdpk (by daspk), check constraints
entry rhsdpk (neq, t, yl, yldot, ifail)
if (icflag .gt. 0 .and. t .gt. 0.) then
if (icflag .eq. 2) rlxl = rlx
do 6 i = 1, neq
ylchng(i) = yl(i) - ylprevc(i)
6 continue
call cnstrt (neq,ylprevc,ylchng,icnstr,tau,rlxl,ifail,ivar)
if (ifail .ne. 0) then
call remark ('***Constraint failure in DASPK, dt reduced***')
write (*,*) 'variable index = ',ivar,' time = ',t
goto 20
endif
else
ifail = 0
endif
call scopy (neq, yl, 1, ylprevc, 1) #put yl into ylprevc
8 tloc = t
write(STDOUT,*) 'sbb pandf1 -1 -1 goto'
go to 10
c ... Beginning of execution for call rhsnk (by nksol).
entry rhsnk (neq, yl, yldot)
tloc = 0.
c ... Calculate right-hand sides for interior and boundary points.
ccc 10 call convsr_vo (-1,-1, yl) # test new convsr placement
ccc call convsr_aux (-1,-1, yl) # test new convsr placement
write(STDOUT,*) 'sbb pandf1 -1 -1 sequential'
10 call pandf1 (-1, -1, 0, neq, tloc, yl, yldot)
20 continue
return
end
This produces the following output:
$ python2 runcase.py
Read file "gridue " with runid: FREEGS 09/01/2020 # 0 0ms
File attributes:
(' written on: ', 'Mon Apr 13 17:51:10 2020')
(' by code: ', 'UEDGE')
(' physics tag: ', array(['$Name: $'], dtype='|S80'))
UEDGE $Name: $
*** For isimpon=2, set afracs, not afrac ***
Read file "gridue " with runid: FREEGS 09/01/2020 # 0 0ms
sbb pandf1 -1 -1 sequential
Updating Jacobian, npe = 1
sbb pandf1 8449
sbb pandf1 8518
sbb pandf1 8449
sbb pandf1 8518
sbb pandf1 8449
sbb pandf1 8518
(more pandf1 messages...)
sbb pandf1 8449
sbb pandf1 8518
sbb pandf1 8524
sbb pandf1 -1 -1 sequential
sbb nksol
sbb icntnu= 0
sbb pandf1 -1 -1 sequential
iter= 0 fnrm= 0.2571840410398497 nfe= 1
nksol --- iterm = 1.
maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
the maximum norm function. u is probably an
approximate root of f.
sbb ffun
sbb pandf1 -1 -1 goto
*---------------------------------------------------------*
Need to take initial step with Jacobian; trying to do here
*---------------------------------------------------------*
*** For isimpon=2, set afracs, not afrac ***
Read file "gridue " with runid: FREEGS 09/01/2020 # 0 0ms
sbb pandf1 -1 -1 sequential
Updating Jacobian, npe = 1
sbb pandf1 8449
sbb pandf1 8518
sbb pandf1 8449
sbb pandf1 8518
sbb pandf1 8449
sbb pandf1 8518
(more pandf1 messages...)
sbb pandf1 8518
sbb pandf1 8449
sbb pandf1 8518
sbb pandf1 8449
sbb pandf1 8518
sbb pandf1 8449
sbb pandf1 8518
sbb pandf1 8524
sbb pandf1 -1 -1 sequential
sbb nksol
sbb icntnu= 0
sbb pandf1 -1 -1 sequential
iter= 0 fnrm= 0.2571840410398498 nfe= 1
nksol --- iterm = 1.
maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
the maximum norm function. u is probably an
approximate root of f.
sbb ffun
sbb pandf1 -1 -1 goto
initial fnrm =2.5718E-01
--------------------------------------------------------------------
--------------------------------------------------------------------
*** Number time-step changes = 1 New time-step = 1.0000E-10
rundt time elapsed: 0:00:10
--------------------------------------------------------------------
*** For isimpon=2, set afracs, not afrac ***
sbb pandf1 -1 -1 sequential
sbb nksol
sbb icntnu= 1
sbb pandf1 -1 -1 sequential
iter= 0 fnrm= 0.2571840410398498 nfe= 1
sbb call model 1
sbb model
sbb if pthrsh= 0.0000000000000000 gt onept5= 1.5000000000000000
and ipflg= 1 ne 0
sbb solpk
sbb spimgr
sbb atv
sbb call f
sbb pandf1 -1 -1 sequential
sbb after solpk iersl= 1 ipflg= 1 ipcur= 0
sbb if pthrsh= 0.0000000000000000 gt onept5= 1.5000000000000000
and ipflg= 1 ne 0
sbb solpk
sbb spimgr
sbb atv
sbb call f
sbb pandf1 -1 -1 sequential
sbb after solpk iersl= 1 ipflg= 1 ipcur= 0
sbb if pthrsh= 0.0000000000000000 gt onept5= 1.5000000000000000
and ipflg= 1 ne 0
(and so on until ctrl-Z)
Because pthrsh is always 0, ipcur never gets set to anything other than 0, which is required in order to stop looping.
pthrsh is set to 0 if icntnu != 0. icntnu indicates if this is a continuation call to nksol that makes use of old values.
Another interesting thing is that the arguments supplied to pandf1 change around the time of the hang:
pandf1 xc=49 yc=33
pandf1 xc=49 yc=33
pandf1 xc=49 yc=33
pandf1 xc=49 yc=33
pandf1 xc=49 yc=33
pandf1 xc=49 yc=33
pandf1 xc=49 yc=33
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
iter= 0 fnrm= 0.2571840410398498 nfe= 1
nksol --- iterm = 1.
maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
the maximum norm function. u is probably an
approximate root of f.
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
initial fnrm =2.5718E-01
--------------------------------------------------------------------
--------------------------------------------------------------------
*** Number time-step changes = 1 New time-step = 1.0000E-10
rundt time elapsed: 0:00:09
--------------------------------------------------------------------
*** For isimpon=2, set afracs, not afrac ***
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
iter= 0 fnrm= 0.2571840410398498 nfe= 1
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
pandf1 xc=-1 yc=-1
(and so on until ctrl-Z)
-1 is not an invalid argument, but apparently means "full RHS evaluation" rather than "poloidal/radial index of perturbed variable for Jacobian calc".
Also tested a case which does not hang (even with -Ofast) and found that it also has a long period of pandf(-1, -1, ...) calls where the following form repeats many times but eventually the code moves on:
sbb atv
sbb call f
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
From the absence of certain print statements (compare to end of the 4th code block in this post), we can see that these calls are being made from a different loop, supporting the claim that the loop identified above is the one that needs fixing.
Also ran the hanging case for several minutes to make sure that it was entirely pandf(-1, -1, ...) calls and it didn't start doing other things.
Also checked out Jerome Guterl's pandf issue but this seems to be a problem inside pandf, not outside, as we have here.
Noticed that in the above output, iersl is set to 1 after subroutine solpk finishes, indicating that "the krylov solver suffered a breakdown, and so the solution x is undefined."
When I compile without -Ofast, solpk runs once and finishes with iersl 0, indicating that no trouble occurred, and we get out of the loop successfully:
sbb pandf1 8524
sbb pandf1 xc= 49 yc= 33
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
sbb nksol
sbb icntnu= 0
sbb set pthrsh = two 795
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
iter= 0 fnrm= 0.2571840410397648 nfe= 1
nksol --- iterm = 1.
maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
the maximum norm function. u is probably an
approximate root of f.
sbb ffun
sbb pandf1 -1 -1 goto
sbb pandf1 xc= -1 yc= -1
sbb pandf1 xc= -1 yc= -1
initial fnrm =2.5718E-01
--------------------------------------------------------------------
--------------------------------------------------------------------
*** Number time-step changes = 1 New time-step = 1.0000E-10
rundt time elapsed: 0:00:23
--------------------------------------------------------------------
*** For isimpon=2, set afracs, not afrac ***
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
sbb nksol
sbb icntnu= 1
sbb set pthrsh = zero 799
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
iter= 0 fnrm= 0.2571840410397648 nfe= 1
sbb call model 1
sbb model
sbb if pthrsh= 0.0000000000000000 gt onept5= 1.5000000000000000
and ipflg= 1 ne 0
sbb solpk
sbb spimgr
sbb atv
sbb call f
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
sbb atv
sbb call f
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
(same messages repeating...)
sbb atv
sbb call f
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
sbb atv
sbb call f
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
sbb after solpk iersl= 0 ipflg= 1 ipcur= 0
sbb pandf1 -1 -1 sequential
sbb pandf1 xc= -1 yc= -1
sbb set pthrsh = two 1241
iter= 1 fnrm= NaN nfe= 102
sbb call model 2
sbb model
sbb if pthrsh= 2.0000000000000000 gt onept5= 1.5000000000000000
and ipflg= 1 ne 0
sbb call psetnk
replace -Ofast by -03 -fstack-arrays and see what happens. Ofast enables unsafe memory data racing.
Just checked and found that -O3 -fstack-arrays has the same behavior as -Ofast in this case.
Did you check that you don't have any NaN while evaluating the rhs? You can put a loop iv=1 to neq with if isnan(yldot(iv)) stop
On Tue, Apr 28, 2020, 19:38 Sean Ballinger [email protected] wrote:
Just checked and found that -O3 -fstack-arrays has the same behavior as -Ofast in this case.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LLNL/UEDGE/issues/16#issuecomment-620960173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEESMZHUMTO42T7XHWSPJ6LRO6HKFANCNFSM4MI6YP7Q .
(Just keeping this issue up to date.) I put the following code at the end of subroutine pandf1 in oderhs.m:
do iv=1,neq
if (isnan(yldot(iv))) then
stop
endif
enddo
and found that the code stopped at this spot even with -O3 -fstack-arrays.
Maxim, Bill, and Sean, I know there was some effort to get a comparable case on singe.llnl.gov so I could look at at, but I haven’t heard anything for several days. What is the status here?
I am assuming the Sean is running some variant of UEDGE V7.08.04. Some weeks ago, Roman Smirnov pointed out a couple of bugs that have been corrected in the CVS version, but not yet uploaded to GitHub as I am working on one additional update before releasing it. But the bugs Roman found I simply fixed in any version. They are as follows:
In bbb/odesetup.m:
odesetup.m-c... Construct second intermediate velocity grid (xvnrmnx,yvnrmnx) odesetup.m- do ir = 1, 3*nxpt odesetup.m: call grdintpy(ixsto(ir),ixendo(i),ixst(ir),ixend(ir),
For the last line, make the change ixendo(i) --> ixendo(ir)
In bbb/oderhs.m:
oderhs.m- 255 continue oderhs.m- do igsp = 1, ngsp oderhs.m- nbg2dot(igsp) = 0. oderhs.m: if(isngonxy(ix,iy,ifld) == 1) then
For the last line, make the change isngonxy(ix,iy,ifld) --> isngonxy(ix,iy,igsp)
This was also a problem with cases that solve for the potential (isphion=1) that has been fixed, but I don’t think that Sean is evolving the potential equation.
My understand of the problems that Sean finds appear when some form of compiler optimization is utilized, but go away when a debuggable (-g) version is used. But this may be incorrect.
Please let me know where this all stands for Sean’s cases, and I am glad to participate in a call if that is a good way to make progress.
-Tom
Thomas D. Rognlien Email: [email protected]mailto:[email protected] L-440 (B3725, R432) Tel: 925-422-9830 LLNL, 7000 East Ave, P.O. Box 808 Admin support: 925-422-7446 Livermore, CA 94551
From: Sean Ballinger [email protected] Reply-To: LLNL/UEDGE [email protected] Date: Friday, May 1, 2020 at 8:20 PM To: LLNL/UEDGE [email protected] Cc: Subscribed [email protected] Subject: Re: [LLNL/UEDGE] Infinite loop in nksol.m subroutine model (#16)
(Just keeping this issue up to date.) I put the following code at the end of subroutine pandf1 in oderhs.m:
do iv=1,neq
if (isnan(yldot(iv))) then
stop
endif
enddo
and found that the code stopped at this spot even with -O3 -fstack-arrays.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/UEDGE/issues/16#issuecomment-622661157, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAILAYVRAFZ5XJJ3DW75MM3RPOGNZANCNFSM4MI6YP7Q.
Maybe we could get you a temporary PSFC account? We have totalview. I am also happy to debug over Zoom as I have it all set up.
I am using UEDGE version 7.0.8.4.14 and not evolving the potential equation. It appears to me that bugs happen both with -Ofast and without, but they manifest differently. I don't think -g prevents optimization or otherwise affects the code.
I tried Roman's fixes with and without -Ofast, and the output/hanging behavior was the same.
Sean,
Check at the bottom of odepandf.m in the folder src/bbb of my uedge fork, there are some routines for debugging purpose. You can also print out the Jacobian. Also here is a subroutine generator for debugging purpose: #!/usr/bin/env python3
-- coding: utf-8 --
""" Created on Wed Mar 25 22:04:48 2020
@author: jguterl
"""
from uedge import *
#%%
class WriteDebugRoutine():
def init(self,FileName,ListVariable,Doc):
self.ListVariable=self.ListUse=list(dict.fromkeys(ListVariable))
self.ListVariable.sort()
self.Doc=Doc
self.FileName=FileName
self.ListUse=[]
self.VarDic={}
self.GetVarDoc()
self.GetListGrp()
self.WriteFortranSubroutine()
def SetFfile(self):
"""
Set the ffile attribute, which is the fortran file object.
It the attribute hasn't been created, then open the file with write status.
If it has, and the file is closed, then open it with append status.
"""
if 'ffile' in self.dict:
status = 'a'
else:
status = 'w'
if status == 'w' or (status == 'a' and self.ffile.closed):
self.ffile = open(self.FileName, status)
def fw90(self, text, noreturn=0):
i = 0
while len(text[i:]) > 132 and text[i:].find('&') == -1:
# --- If the line is too long, then break it up, adding line
# --- continuation marks in between any variable names.
# --- This is the same as \W, but also skips %, since PG compilers
# --- don't seem to like a line continuation mark just before a %.
ss = re.search('[^a-zA-Z0-9_%]', text[i+130::-1])
assert ss is not None, "Forthon can't find a place to break up this line:\n" + text
text = text[:i+130-ss.start()] + '&\n' + text[i+130-ss.start():]
i += 130 - ss.start() + 1
if noreturn:
self.ffile.write(' '+text)
else:
self.ffile.write(' '+text + '\n')
def GetVarDoc(self):
for VarName in self.ListVariable:
VarDoc=self.Doc.GetVarInfo(VarName)
if len(VarDoc)<1:
raise ValueError('Cannot find variable {}'.format(VarName))
elif len(VarDoc)>1:
raise ValueError('Found variable {} in two groups'.format(VarName))
else:
self.VarDic[VarName]=VarDoc[0]
def GetListGrp(self):
for VarName,VarDoc in self.VarDic.items():
self.ListUse.append(VarDoc['Group'])
self.ListUse=list(dict.fromkeys(self.ListUse))
self.ListUse.sort()
def WriteFortranSubroutine(self):
self.SetFfile()
self.fw90('subroutine WriteArrayReal(array,s,iu)')
self.fw90('implicit none')
self.fw90('real:: array(*)')
self.fw90('integer:: i,s,iu')
self.fw90('do i=1,s')
self.fw90('write(iu,*) array(i)')
self.fw90('enddo')
self.fw90('end subroutine WriteArrayReal')
self.fw90('subroutine WriteArrayInteger(array,s,iu)')
self.fw90('implicit none')
self.fw90('integer:: array(*)')
self.fw90('integer:: i,s,iu')
self.fw90('do i=1,s')
self.fw90('write(iu,*) array(i)')
self.fw90('enddo')
self.fw90('end subroutine WriteArrayInteger')
self.fw90('subroutine DebugHelper(FileName)')
self.fw90('')
for UseGrp in self.ListUse:
self.fw90('Use {} '.format(UseGrp))
self.fw90('implicit none')
self.fw90('integer:: iunit')
self.fw90('character(len = *) :: filename')
self.fw90('open (newunit = iunit, file = trim(filename))')
for VarName,VarDoc in self.VarDic.items():
self.fw90('write(iunit,*) "{}"'.format(VarName))
if VarDoc['Dimension'] is None:
self.fw90('write(iunit,*) {}'.format(VarName))
else:
if 'integer' in VarDoc['Type']:
self.fw90('call WriteArrayInteger({},size({}),iunit)'.format(VarName,VarName))
elif 'real' in VarDoc['Type'] or 'double' in VarDoc['Type']:
self.fw90('call WriteArrayReal({},size({}),iunit)'.format(VarName,VarName))
else:
raise ValueError('Unknown type')
self.fw90('close(iunit)')
self.fw90('end subroutine DebugHelper')
self.ffile.close()
#%%
#Dic File is generated by the script UEDGEFortranParser.py
ListVariable=DicFile['convert']['convsr_vo']['AssignedNonLocalVars']+DicFile['convert']['convsr_aux']['AssignedNonLocalVars']+DicFile['odepandf']['pandf']['AssignedNonLocalVars']
dbg=WriteDebugRoutine('DebugHelper.F90',ListVariable,Doc)
#%%
def CompareDump(FileName1,FileName2):
Dic1=ReadDumpFile(FileName1)
Dic2=ReadDumpFile(FileName2)
VarCheck={}
for Var in Dic1.keys():
VarCheck[Var]=True
if len(Dic1[Var])!=len(Dic2[Var]):
print(Var)
VarCheck[Var]=False
#aise ValueError('dics of different length')
continue
isfirst=True
for i,(L1,L2) in enumerate(zip(Dic1[Var],Dic2[Var])):
if L1!=L2:
VarCheck[Var]=False
if isfirst:
print(Var,i)
isfirst=False
return VarCheck
def ReadDumpFile(FileName): file = open(FileName, 'r') Lines = file.readlines() file.close() Dic={} for L in Lines: L=L.rstrip().strip() try: Lf=float(L) isnumeric=True except: isnumeric=False if not isnumeric: VarName=L Dic[VarName]=[] else: Dic[VarName].append(float(L)) return Dic
FileName1='/home/jguterl/Dropbox/python/UEDGERunDir/dumpregular.txt' FileName2='/home/jguterl/Dropbox/python/UEDGERunDir/dumpomp.txt' VarCheck=CompareDump(FileName1,FileName2) for V,B in VarCheck.items(): if not B: print(V)
On Tue, May 5, 2020, 11:14 Sean Ballinger [email protected] wrote:
Maybe we could get you a temporary PSFC account? We have totalview. I am also happy to debug over Zoom as I have it all set up.
I am using UEDGE version 7.0.8.4.14 and not evolving the potential equation. It appears to me that bugs happen both with -Ofast and without, but they manifest differently. I don't think -g prevents optimization or otherwise affects the code.
I tried Roman's fixes with and without -Ofast, and the output/hanging behavior was the same.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LLNL/UEDGE/issues/16#issuecomment-624221515, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEESMZGC6K7JNVFVI3LIBVDRQBJRZANCNFSM4MI6YP7Q .
Sean, I think a zoom session is the place to start. If you are finding a nan for one of the yldot’s, the components that go into yldot are available to look at from the parser, and one of those must also have a nan – and so we can drill down to identify the specific source - hopefully. Please send me the source files in the subdirectories bbb and svr. If these are in a tar or zip file, you must change the prefix to something like .tax or .zix before sending it to me so that it will make it through the LLNL mail filter – unknown tar and zip files with those extensions are not allowed.
I can do a zoom session after 3 PDT today or after 1:30 PDT tomorrow.
-Tom
Thomas D. Rognlien Email: [email protected]mailto:[email protected] L-440 (B3725, R432) Tel: 925-422-9830 LLNL, 7000 East Ave, P.O. Box 808 Admin support: 925-422-7446 Livermore, CA 94551
From: jguterl [email protected] Reply-To: LLNL/UEDGE [email protected] Date: Tuesday, May 5, 2020 at 12:10 PM To: LLNL/UEDGE [email protected] Cc: Tom Rognlien [email protected], Comment [email protected] Subject: Re: [LLNL/UEDGE] Infinite loop in nksol.m subroutine model (#16)
Sean,
Check at the bottom of odepandf.m in the folder src/bbb of my uedge fork, there are some routines for debugging purpose. You can also print out the Jacobian.
On Tue, May 5, 2020, 11:14 Sean Ballinger [email protected] wrote:
Maybe we could get you a temporary PSFC account? We have totalview. I am also happy to debug over Zoom as I have it all set up.
I am using UEDGE version 7.0.8.4.14 and not evolving the potential equation. It appears to me that bugs happen both with -Ofast and without, but they manifest differently. I don't think -g prevents optimization or otherwise affects the code.
I tried Roman's fixes with and without -Ofast, and the output/hanging behavior was the same.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LLNL/UEDGE/issues/16#issuecomment-624221515, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEESMZGC6K7JNVFVI3LIBVDRQBJRZANCNFSM4MI6YP7Q .
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/UEDGE/issues/16#issuecomment-624249801, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAILAYRTP3Q6UX3H6WLNC63RQBQABANCNFSM4MI6YP7Q.