I've been encountering a problem where certain cases "hang". The text output stops, the code continues running, but nothing else happens (for example, the intermediate hdf5 savefiles do not get updated). Here is an example of the output (I'm using rundt.py):

$ python2 runcase.py
 Read file "gridue  " with runid:  FREEGS     09/01/2020        # 0  0ms

File attributes:
('     written on: ', 'Mon Apr 13 17:51:10 2020')
('        by code: ', 'UEDGE')
('    physics tag: ', array(['$Name:  $'], dtype='|S80'))
 UEDGE $Name:  $
*** For isimpon=2, set afracs, not afrac ***
 Read file "gridue  " with runid:  FREEGS     09/01/2020        # 0  0ms

  Updating Jacobian, npe =                      1
 iter=    0 fnrm=     0.2571840422417168     nfe=      1


 nksol ---  iterm = 1.
            maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
            the maximum norm function.  u is probably an
            approximate root of f.
*---------------------------------------------------------*
Need to take initial step with Jacobian; trying to do here
*---------------------------------------------------------*
*** For isimpon=2, set afracs, not afrac ***
 Read file "gridue  " with runid:  FREEGS     09/01/2020        # 0  0ms

  Updating Jacobian, npe =                      1
 iter=    0 fnrm=     0.2571840422417166     nfe=      1


 nksol ---  iterm = 1.
            maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
            the maximum norm function.  u is probably an
            approximate root of f.
initial fnrm =2.5718E-01
--------------------------------------------------------------------
--------------------------------------------------------------------

*** Number time-step changes = 1 New time-step = 1.0000E-10
rundt time elapsed: 0:00:08
--------------------------------------------------------------------
*** For isimpon=2, set afracs, not afrac ***
 iter=    0 fnrm=     0.2571840422417166     nfe=      1
(stays like this forever)
^Z

It is also possible to get the code to continue by compiling without the -Ofast flag in Makefile.Forthon. In this case, you start seeing fnrm= nan but the output continues until dtreal goes below dtkill and the code quits by itself.

I examined the case with -Ofast which hangs. I found that the pandf subroutine was being called many times. By adding some print statements and going up the call stack, I found this infinite loop inside subroutine model in nksol.m:

 10   continue
      ipcur = 0
      write(STDOUT,*) 'sbb if pthrsh=',pthrsh,'gt onept5=',onept5
      write(STDOUT,*) '       and ipflg=',ipflg,'ne 0'
      if ( (pthrsh .gt. onept5) .and. (ipflg .ne. 0) ) then
        ier = 0
        write(STDOUT,*) 'sbb call psetnk'
        call pset (n, u, savf, su, sf, x, f, wm(locwmp), iwm(locimp),
     *             ier)
        npe = npe + 1
        ipcur = 1
        nnipset = nni
        if (ier .ne. 0) then
          iersl = 8
          return
          endif
        endif
c-----------------------------------------------------------------------
c     load x with -f(u).
c-----------------------------------------------------------------------
      do 100 i = 1,n
 100    x(i) = -savf(i)
c-----------------------------------------------------------------------
c     call solpk to solve j*x = -f using the appropriate krylov
c     algorithm.
c-----------------------------------------------------------------------
      call solpk (n,wm,lenwm,iwm,leniwm,u,savf,x,su,sf,f,jac,psol)
      write(STDOUT,*) 'sbb after solpk iersl=', iersl,'ipflg=',ipflg,
     * 'ipcur=',ipcur 
      if (iersl .lt. 0) then
c nonrecoverable error from psol.  set iersl and return.
         iersl = 9
         return
         endif
      if ( (iersl .gt. 0) .and. (ipflg .ne. 0) ) then
        if (ipcur .eq. 0) go to 10
        endif

I also put print statements around pandf1 calls in oderhs.m. The pandf1 calls followed by line numbers (messed up by the additional lines I inserted into the file) correspond to other pandf1 calls in the jac_calc subroutine.

c ... Beginning of execution for call rhsdpk (by daspk), check constraints
      entry rhsdpk (neq, t, yl, yldot, ifail)
      
      if (icflag .gt. 0 .and. t .gt. 0.) then     
         if (icflag .eq. 2) rlxl = rlx
         do 6 i = 1, neq     
            ylchng(i) = yl(i) - ylprevc(i)
 6       continue
         call cnstrt (neq,ylprevc,ylchng,icnstr,tau,rlxl,ifail,ivar) 
         if (ifail .ne. 0) then
            call remark ('***Constraint failure in DASPK, dt reduced***')
            write (*,*) 'variable index = ',ivar,'   time = ',t
            goto 20
         endif
      else
         ifail = 0
      endif
      call scopy (neq, yl, 1, ylprevc, 1)  #put yl into ylprevc 

 8    tloc = t
      write(STDOUT,*) 'sbb pandf1 -1 -1 goto'
      go to 10

c ... Beginning of execution for call rhsnk (by nksol).
      entry rhsnk (neq, yl, yldot)
      tloc = 0.

c ... Calculate right-hand sides for interior and boundary points.
ccc 10   call convsr_vo (-1,-1, yl)  # test new convsr placement
ccc      call convsr_aux (-1,-1, yl) # test new convsr placement
      write(STDOUT,*) 'sbb pandf1 -1 -1 sequential'
 10   call pandf1 (-1, -1, 0, neq, tloc, yl, yldot)

 20   continue
      return
      end

This produces the following output:

$ python2 runcase.py
 Read file "gridue  " with runid:  FREEGS     09/01/2020        # 0  0ms                       

File attributes:
('     written on: ', 'Mon Apr 13 17:51:10 2020')
('        by code: ', 'UEDGE')
('    physics tag: ', array(['$Name:  $'], dtype='|S80'))
 UEDGE $Name:  $                                                                       
*** For isimpon=2, set afracs, not afrac ***
 Read file "gridue  " with runid:  FREEGS     09/01/2020        # 0  0ms                       

 sbb pandf1 -1 -1 sequential
  Updating Jacobian, npe =                      1
 sbb pandf1 8449
 sbb pandf1 8518
 sbb pandf1 8449
 sbb pandf1 8518
 sbb pandf1 8449
 sbb pandf1 8518
(more pandf1 messages...)
 sbb pandf1 8449
 sbb pandf1 8518
 sbb pandf1 8524
 sbb pandf1 -1 -1 sequential
 sbb nksol
 sbb icntnu=                    0
 sbb pandf1 -1 -1 sequential
 iter=    0 fnrm=     0.2571840410398497     nfe=      1


 nksol ---  iterm = 1.
            maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
            the maximum norm function.  u is probably an
            approximate root of f.
 sbb ffun
 sbb pandf1 -1 -1 goto
*---------------------------------------------------------*
Need to take initial step with Jacobian; trying to do here
*---------------------------------------------------------*
*** For isimpon=2, set afracs, not afrac ***
 Read file "gridue  " with runid:  FREEGS     09/01/2020        # 0  0ms                       

 sbb pandf1 -1 -1 sequential
  Updating Jacobian, npe =                      1
 sbb pandf1 8449
 sbb pandf1 8518
 sbb pandf1 8449
 sbb pandf1 8518
 sbb pandf1 8449
 sbb pandf1 8518
(more pandf1 messages...)
 sbb pandf1 8518
 sbb pandf1 8449
 sbb pandf1 8518
 sbb pandf1 8449
 sbb pandf1 8518
 sbb pandf1 8449
 sbb pandf1 8518
 sbb pandf1 8524
 sbb pandf1 -1 -1 sequential
 sbb nksol
 sbb icntnu=                    0
 sbb pandf1 -1 -1 sequential
 iter=    0 fnrm=     0.2571840410398498     nfe=      1


 nksol ---  iterm = 1.
            maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
            the maximum norm function.  u is probably an
            approximate root of f.
 sbb ffun
 sbb pandf1 -1 -1 goto
initial fnrm =2.5718E-01
--------------------------------------------------------------------
--------------------------------------------------------------------
 
*** Number time-step changes = 1 New time-step = 1.0000E-10
rundt time elapsed: 0:00:10
--------------------------------------------------------------------
*** For isimpon=2, set afracs, not afrac ***
 sbb pandf1 -1 -1 sequential
 sbb nksol
 sbb icntnu=                    1
 sbb pandf1 -1 -1 sequential
 iter=    0 fnrm=     0.2571840410398498     nfe=      1
 sbb call model                     1
 sbb model
 sbb if pthrsh=   0.0000000000000000      gt onept5=   1.5000000000000000     
        and ipflg=                    1 ne 0
 sbb solpk
 sbb spimgr
 sbb atv
 sbb call f
 sbb pandf1 -1 -1 sequential
 sbb after solpk iersl=                    1 ipflg=                    1 ipcur=                    0
 sbb if pthrsh=   0.0000000000000000      gt onept5=   1.5000000000000000     
        and ipflg=                    1 ne 0
 sbb solpk
 sbb spimgr
 sbb atv
 sbb call f
 sbb pandf1 -1 -1 sequential
 sbb after solpk iersl=                    1 ipflg=                    1 ipcur=                    0
 sbb if pthrsh=   0.0000000000000000      gt onept5=   1.5000000000000000     
        and ipflg=                    1 ne 0

(and so on until ctrl-Z)

Because pthrsh is always 0, ipcur never gets set to anything other than 0, which is required in order to stop looping.

pthrsh is set to 0 if icntnu != 0. icntnu indicates if this is a continuation call to nksol that makes use of old values.

Another interesting thing is that the arguments supplied to pandf1 change around the time of the hang:

 pandf1 xc=49  yc=33
 pandf1 xc=49  yc=33
 pandf1 xc=49  yc=33
 pandf1 xc=49  yc=33
 pandf1 xc=49  yc=33
 pandf1 xc=49  yc=33
 pandf1 xc=49  yc=33
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 iter=    0 fnrm=     0.2571840410398498     nfe=      1


 nksol ---  iterm = 1.
            maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
            the maximum norm function.  u is probably an
            approximate root of f.
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
initial fnrm =2.5718E-01
--------------------------------------------------------------------
--------------------------------------------------------------------
 
*** Number time-step changes = 1 New time-step = 1.0000E-10
rundt time elapsed: 0:00:09
--------------------------------------------------------------------
*** For isimpon=2, set afracs, not afrac ***
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 iter=    0 fnrm=     0.2571840410398498     nfe=      1
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
 pandf1 xc=-1  yc=-1
(and so on until ctrl-Z)

-1 is not an invalid argument, but apparently means "full RHS evaluation" rather than "poloidal/radial index of perturbed variable for Jacobian calc".

Also tested a case which does not hang (even with -Ofast) and found that it also has a long period of pandf(-1, -1, ...) calls where the following form repeats many times but eventually the code moves on:

 sbb atv
 sbb call f
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1

From the absence of certain print statements (compare to end of the 4th code block in this post), we can see that these calls are being made from a different loop, supporting the claim that the loop identified above is the one that needs fixing.

Also ran the hanging case for several minutes to make sure that it was entirely pandf(-1, -1, ...) calls and it didn't start doing other things.

Also checked out Jerome Guterl's pandf issue but this seems to be a problem inside pandf, not outside, as we have here.

Apr 15 '20 23:04 sballin

Noticed that in the above output, iersl is set to 1 after subroutine solpk finishes, indicating that "the krylov solver suffered a breakdown, and so the solution x is undefined."

When I compile without -Ofast, solpk runs once and finishes with iersl 0, indicating that no trouble occurred, and we get out of the loop successfully:

 sbb pandf1 8524
 sbb pandf1 xc=                   49  yc=                   33
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
 sbb nksol
 sbb icntnu=                    0
 sbb set pthrsh = two 795
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
 iter=    0 fnrm=     0.2571840410397648     nfe=      1


 nksol ---  iterm = 1.
            maxnorm(sf*f(u)) .le. ftol, where maxnorm() is
            the maximum norm function.  u is probably an
            approximate root of f.
 sbb ffun
 sbb pandf1 -1 -1 goto
 sbb pandf1 xc=                   -1  yc=                   -1
 sbb pandf1 xc=                   -1  yc=                   -1
initial fnrm =2.5718E-01
--------------------------------------------------------------------
--------------------------------------------------------------------
 
*** Number time-step changes = 1 New time-step = 1.0000E-10
rundt time elapsed: 0:00:23
--------------------------------------------------------------------
*** For isimpon=2, set afracs, not afrac ***
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
 sbb nksol
 sbb icntnu=                    1
 sbb set pthrsh = zero 799
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
 iter=    0 fnrm=     0.2571840410397648     nfe=      1
 sbb call model                     1
 sbb model
 sbb if pthrsh=   0.0000000000000000      gt onept5=   1.5000000000000000     
        and ipflg=                    1 ne 0
 sbb solpk
 sbb spimgr
 sbb atv
 sbb call f
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
 sbb atv
 sbb call f
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
(same messages repeating...)
 sbb atv
 sbb call f
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
 sbb atv
 sbb call f
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
 sbb after solpk iersl=                    0 ipflg=                    1 ipcur=                    0
 sbb pandf1 -1 -1 sequential
 sbb pandf1 xc=                   -1  yc=                   -1
 sbb set pthrsh = two 1241
 iter=    1 fnrm=                        NaN nfe=    102
 sbb call model                     2
 sbb model
 sbb if pthrsh=   2.0000000000000000      gt onept5=   1.5000000000000000     
        and ipflg=                    1 ne 0
 sbb call psetnk

Apr 17 '20 02:04 sballin

replace -Ofast by -03 -fstack-arrays and see what happens. Ofast enables unsafe memory data racing.

Apr 28 '20 22:04 jguterl

Just checked and found that -O3 -fstack-arrays has the same behavior as -Ofast in this case.

Apr 29 '20 02:04 sballin

Did you check that you don't have any NaN while evaluating the rhs? You can put a loop iv=1 to neq with if isnan(yldot(iv)) stop

On Tue, Apr 28, 2020, 19:38 Sean Ballinger [email protected] wrote:

Just checked and found that -O3 -fstack-arrays has the same behavior as -Ofast in this case.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LLNL/UEDGE/issues/16#issuecomment-620960173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEESMZHUMTO42T7XHWSPJ6LRO6HKFANCNFSM4MI6YP7Q .

Apr 29 '20 03:04 jguterl

(Just keeping this issue up to date.) I put the following code at the end of subroutine pandf1 in oderhs.m:

do iv=1,neq
  if (isnan(yldot(iv))) then
    stop
  endif
enddo

and found that the code stopped at this spot even with -O3 -fstack-arrays.

May 02 '20 03:05 sballin

Maxim, Bill, and Sean, I know there was some effort to get a comparable case on singe.llnl.gov so I could look at at, but I haven’t heard anything for several days. What is the status here?

I am assuming the Sean is running some variant of UEDGE V7.08.04. Some weeks ago, Roman Smirnov pointed out a couple of bugs that have been corrected in the CVS version, but not yet uploaded to GitHub as I am working on one additional update before releasing it. But the bugs Roman found I simply fixed in any version. They are as follows:

In bbb/odesetup.m:

odesetup.m-c... Construct second intermediate velocity grid (xvnrmnx,yvnrmnx) odesetup.m- do ir = 1, 3*nxpt odesetup.m: call grdintpy(ixsto(ir),ixendo(i),ixst(ir),ixend(ir),

For the last line, make the change ixendo(i) --> ixendo(ir)

In bbb/oderhs.m:

oderhs.m- 255 continue oderhs.m- do igsp = 1, ngsp oderhs.m- nbg2dot(igsp) = 0. oderhs.m: if(isngonxy(ix,iy,ifld) == 1) then

For the last line, make the change isngonxy(ix,iy,ifld) --> isngonxy(ix,iy,igsp)

This was also a problem with cases that solve for the potential (isphion=1) that has been fixed, but I don’t think that Sean is evolving the potential equation.

My understand of the problems that Sean finds appear when some form of compiler optimization is utilized, but go away when a debuggable (-g) version is used. But this may be incorrect.

Please let me know where this all stands for Sean’s cases, and I am glad to participate in a call if that is a good way to make progress.

-Tom

Thomas D. Rognlien Email: [email protected]mailto:[email protected] L-440 (B3725, R432) Tel: 925-422-9830 LLNL, 7000 East Ave, P.O. Box 808 Admin support: 925-422-7446 Livermore, CA 94551

From: Sean Ballinger [email protected] Reply-To: LLNL/UEDGE [email protected] Date: Friday, May 1, 2020 at 8:20 PM To: LLNL/UEDGE [email protected] Cc: Subscribed [email protected] Subject: Re: [LLNL/UEDGE] Infinite loop in nksol.m subroutine model (#16)

(Just keeping this issue up to date.) I put the following code at the end of subroutine pandf1 in oderhs.m:

do iv=1,neq

if (isnan(yldot(iv))) then

stop

endif

enddo

and found that the code stopped at this spot even with -O3 -fstack-arrays.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/UEDGE/issues/16#issuecomment-622661157, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAILAYVRAFZ5XJJ3DW75MM3RPOGNZANCNFSM4MI6YP7Q.

May 05 '20 17:05 trognlien

Maybe we could get you a temporary PSFC account? We have totalview. I am also happy to debug over Zoom as I have it all set up.

I am using UEDGE version 7.0.8.4.14 and not evolving the potential equation. It appears to me that bugs happen both with -Ofast and without, but they manifest differently. I don't think -g prevents optimization or otherwise affects the code.

I tried Roman's fixes with and without -Ofast, and the output/hanging behavior was the same.

May 05 '20 18:05 sballin

Sean,

Check at the bottom of odepandf.m in the folder src/bbb of my uedge fork, there are some routines for debugging purpose. You can also print out the Jacobian. Also here is a subroutine generator for debugging purpose: #!/usr/bin/env python3

-- coding: utf-8 --

""" Created on Wed Mar 25 22:04:48 2020

@author: jguterl """ from uedge import * #%% class WriteDebugRoutine(): def init(self,FileName,ListVariable,Doc): self.ListVariable=self.ListUse=list(dict.fromkeys(ListVariable)) self.ListVariable.sort() self.Doc=Doc self.FileName=FileName self.ListUse=[] self.VarDic={} self.GetVarDoc() self.GetListGrp() self.WriteFortranSubroutine() def SetFfile(self): """ Set the ffile attribute, which is the fortran file object. It the attribute hasn't been created, then open the file with write status. If it has, and the file is closed, then open it with append status. """ if 'ffile' in self.dict: status = 'a' else: status = 'w' if status == 'w' or (status == 'a' and self.ffile.closed): self.ffile = open(self.FileName, status)
def fw90(self, text, noreturn=0): i = 0 while len(text[i:]) > 132 and text[i:].find('&') == -1: # --- If the line is too long, then break it up, adding line # --- continuation marks in between any variable names. # --- This is the same as \W, but also skips %, since PG compilers # --- don't seem to like a line continuation mark just before a %. ss = re.search('[^a-zA-Z0-9_%]', text[i+130::-1]) assert ss is not None, "Forthon can't find a place to break up this line:\n" + text text = text[:i+130-ss.start()] + '&\n' + text[i+130-ss.start():] i += 130 - ss.start() + 1 if noreturn: self.ffile.write(' '+text) else: self.ffile.write(' '+text + '\n') def GetVarDoc(self): for VarName in self.ListVariable: VarDoc=self.Doc.GetVarInfo(VarName) if len(VarDoc)<1: raise ValueError('Cannot find variable {}'.format(VarName)) elif len(VarDoc)>1: raise ValueError('Found variable {} in two groups'.format(VarName)) else: self.VarDic[VarName]=VarDoc[0] def GetListGrp(self): for VarName,VarDoc in self.VarDic.items(): self.ListUse.append(VarDoc['Group']) self.ListUse=list(dict.fromkeys(self.ListUse)) self.ListUse.sort()

def WriteFortranSubroutine(self):
    self.SetFfile()
    self.fw90('subroutine WriteArrayReal(array,s,iu)')
    self.fw90('implicit none')
    self.fw90('real:: array(*)')
    self.fw90('integer:: i,s,iu')
    self.fw90('do i=1,s')
    self.fw90('write(iu,*) array(i)')
    self.fw90('enddo')
    self.fw90('end subroutine WriteArrayReal')
    self.fw90('subroutine WriteArrayInteger(array,s,iu)')
    self.fw90('implicit none')
    self.fw90('integer:: array(*)')
    self.fw90('integer:: i,s,iu')
    self.fw90('do i=1,s')
    self.fw90('write(iu,*) array(i)')
    self.fw90('enddo') 
    self.fw90('end subroutine WriteArrayInteger')
    
    
    self.fw90('subroutine DebugHelper(FileName)')
    
    self.fw90('')
    for UseGrp in self.ListUse:
        self.fw90('Use {} '.format(UseGrp))
    self.fw90('implicit none')
    self.fw90('integer:: iunit')

    self.fw90('character(len = *) ::  filename')    
    self.fw90('open (newunit = iunit, file = trim(filename))')
    for VarName,VarDoc in self.VarDic.items():
        self.fw90('write(iunit,*) "{}"'.format(VarName))
        if VarDoc['Dimension'] is None:
            self.fw90('write(iunit,*) {}'.format(VarName))
        else:
            if 'integer' in VarDoc['Type']:
                self.fw90('call WriteArrayInteger({},size({}),iunit)'.format(VarName,VarName))
            elif 'real' in VarDoc['Type'] or 'double' in VarDoc['Type']:
                self.fw90('call WriteArrayReal({},size({}),iunit)'.format(VarName,VarName))    
            else:
                raise ValueError('Unknown type')
    self.fw90('close(iunit)')    
    self.fw90('end subroutine DebugHelper')
    self.ffile.close()

#%% #Dic File is generated by the script UEDGEFortranParser.py
ListVariable=DicFile['convert']['convsr_vo']['AssignedNonLocalVars']+DicFile['convert']['convsr_aux']['AssignedNonLocalVars']+DicFile['odepandf']['pandf']['AssignedNonLocalVars']
dbg=WriteDebugRoutine('DebugHelper.F90',ListVariable,Doc) #%% def CompareDump(FileName1,FileName2):

Dic1=ReadDumpFile(FileName1)
Dic2=ReadDumpFile(FileName2)
VarCheck={}
for Var in Dic1.keys():
    VarCheck[Var]=True  
    if len(Dic1[Var])!=len(Dic2[Var]):
        print(Var)
        VarCheck[Var]=False
        #aise ValueError('dics of different length')
        continue
    isfirst=True
    for i,(L1,L2) in enumerate(zip(Dic1[Var],Dic2[Var])):
        if L1!=L2:
            VarCheck[Var]=False
            if isfirst:
                print(Var,i)
                isfirst=False
            
return VarCheck

def ReadDumpFile(FileName): file = open(FileName, 'r') Lines = file.readlines() file.close() Dic={} for L in Lines: L=L.rstrip().strip() try: Lf=float(L) isnumeric=True except: isnumeric=False if not isnumeric: VarName=L Dic[VarName]=[] else: Dic[VarName].append(float(L)) return Dic

FileName1='/home/jguterl/Dropbox/python/UEDGERunDir/dumpregular.txt' FileName2='/home/jguterl/Dropbox/python/UEDGERunDir/dumpomp.txt' VarCheck=CompareDump(FileName1,FileName2) for V,B in VarCheck.items(): if not B: print(V)

On Tue, May 5, 2020, 11:14 Sean Ballinger [email protected] wrote:

Maybe we could get you a temporary PSFC account? We have totalview. I am also happy to debug over Zoom as I have it all set up.

I am using UEDGE version 7.0.8.4.14 and not evolving the potential equation. It appears to me that bugs happen both with -Ofast and without, but they manifest differently. I don't think -g prevents optimization or otherwise affects the code.

I tried Roman's fixes with and without -Ofast, and the output/hanging behavior was the same.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LLNL/UEDGE/issues/16#issuecomment-624221515, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEESMZGC6K7JNVFVI3LIBVDRQBJRZANCNFSM4MI6YP7Q .

May 05 '20 19:05 jguterl

Sean, I think a zoom session is the place to start. If you are finding a nan for one of the yldot’s, the components that go into yldot are available to look at from the parser, and one of those must also have a nan – and so we can drill down to identify the specific source - hopefully. Please send me the source files in the subdirectories bbb and svr. If these are in a tar or zip file, you must change the prefix to something like .tax or .zix before sending it to me so that it will make it through the LLNL mail filter – unknown tar and zip files with those extensions are not allowed.

I can do a zoom session after 3 PDT today or after 1:30 PDT tomorrow.

-Tom

Thomas D. Rognlien Email: [email protected]mailto:[email protected] L-440 (B3725, R432) Tel: 925-422-9830 LLNL, 7000 East Ave, P.O. Box 808 Admin support: 925-422-7446 Livermore, CA 94551

From: jguterl [email protected] Reply-To: LLNL/UEDGE [email protected] Date: Tuesday, May 5, 2020 at 12:10 PM To: LLNL/UEDGE [email protected] Cc: Tom Rognlien [email protected], Comment [email protected] Subject: Re: [LLNL/UEDGE] Infinite loop in nksol.m subroutine model (#16)

Sean,

Check at the bottom of odepandf.m in the folder src/bbb of my uedge fork, there are some routines for debugging purpose. You can also print out the Jacobian.

On Tue, May 5, 2020, 11:14 Sean Ballinger [email protected] wrote:

Maybe we could get you a temporary PSFC account? We have totalview. I am also happy to debug over Zoom as I have it all set up.

I am using UEDGE version 7.0.8.4.14 and not evolving the potential equation. It appears to me that bugs happen both with -Ofast and without, but they manifest differently. I don't think -g prevents optimization or otherwise affects the code.

I tried Roman's fixes with and without -Ofast, and the output/hanging behavior was the same.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LLNL/UEDGE/issues/16#issuecomment-624221515, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEESMZGC6K7JNVFVI3LIBVDRQBJRZANCNFSM4MI6YP7Q .

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/UEDGE/issues/16#issuecomment-624249801, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAILAYRTP3Q6UX3H6WLNC63RQBQABANCNFSM4MI6YP7Q.

May 05 '20 19:05 trognlien