Not enough error checking for adding new restart variables
Brief summary of bug
I added a new restart variable, and used the dim1name of "patch" instead of "pft".
General bug information
CTSM version you are using: branch_tags/dustemisdev.n05_ctsm5.1.dev166-5-gf48830977
Does this bug cause significantly incorrect results in the model's science? No
Configurations affected: When adding new variables to the restart file
Details of bug
An example of where bad error messaging makes it hard to find problems in the code. I found the problem by pulling it up in DDT and then realized the issue when it came up on the define part, and not the write part. I thought it might have been because of bad data in the array to write, or the interpinic_flag.
Important details of your setup / configuration so we can reproduce the bug
call restartvar(ncid=ncid, flag=flag, varname='OBU', xtype=ncd_double, &
dim1name='patch', &
long_name='Monin-Obukhov length', units='m', &
interpinic_flag='skip', readvar=readvar, data=this%obu_patch)
Important output or errors that show the problem
The cesm.log does point to the error, but it's obfuscated enough with tons of output that it's hard to see.
/glade/work/erik/ctsm_worktrees/dust_dev/share/src/shr_file_mod.F90 912 This routine is depricated - use shr_log_setLogUnit instead -12
/glade/work/erik/ctsm_worktrees/dust_dev/share/src/shr_file_mod.F90 912 This routine is depricated - use shr_log_setLogUnit instead -13
/glade/work/erik/ctsm_worktrees/dust_dev/share/src/shr_file_mod.F90 912 This routine is depricated - use shr_log_setLogUnit instead -12
/glade/work/erik/ctsm_worktrees/dust_dev/share/src/shr_file_mod.F90 912 This routine is depricated - use shr_log_setLogUnit instead -13
Abort with message NetCDF: Invalid dimension ID or name in file /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pio_nc.c at line 812
Obtained 10 stack frames.
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(print_trace+0x32) [0x14c46a7a228c]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(piodie+0x77) [0x14c46a7a2399]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(check_netcdf2+0x242) [0x14c46a7a272d]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(check_netcdf+0x34) [0x14c46a7a24e9]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpioc.so(PIOc_inq_dimid+0x3a0) [0x14c46a7c3801]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpiof.so(__pio_nf_MOD_inq_dimid_id+0xb1) [0x14c46aa138cc]
/glade/u/apps/cseg/derecho/23.09/spack/opt/spack/linux-sles15-x86_64_v3/gcc-12.2.0/parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/lib/libpiof.so(__pio_nf_MOD_inq_dimid_desc+0x3d) [0x14c46aa13994]
/glade/derecho/scratch/erik/ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesRs.derecho_gnu.clm-FatesCold.20240624_131200_uzttv8/bld/cesm.exe() [0x5af763]
/glade/derecho/scratch/erik/ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesRs.derecho_gnu.clm-FatesCold.20240624_131200_uzttv8/bld/cesm.exe() [0x5af871]
/glade/derecho/scratch/erik/ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesRs.derecho_gnu.clm-FatesCold.20240624_131200_uzttv8/bld/cesm.exe() [0x71c18c]
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x14c4616efd4f in ???
at /usr/src/debug/glibc-2.31-150300.41.1.x86_64/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
#1 0x14c4616efcbb in __GI_raise
at ../sysdeps/unix/sysv/linux/raise.c:51
#2 0x14c4616f1354 in __GI_abort
at /usr/src/debug/glibc-2.31-150300.41.1.x86_64/stdlib/abort.c:79
#3 0x14c46a7a239d in piodie
at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pioc_support.c:561
#4 0x14c46a7a272c in check_netcdf2
at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pioc_support.c:683
#5 0x14c46a7a24e8 in check_netcdf
at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pioc_support.c:632
#6 0x14c46a7c3800 in PIOc_inq_dimid
at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/clib/pio_nc.c:812
#7 0x14c46aa138cb in __pio_nf_MOD_inq_dimid_id
at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/flib/pio_nf.F90:519
#8 0x14c46aa13993 in __pio_nf_MOD_inq_dimid_desc
at /glade/derecho/scratch/jedwards/tmp/spack-stage/spack-stage-parallelio-2.6.2-x3vfh2bjkpjsumev4h7myd7wf3jvvjub/spack-src/src/flib/pio_nf.F90:448
#9 0x5af762 in __ncdio_pio_MOD_ncd_inqdid
at /glade/work/erik/ctsm_worktrees/dust_dev/src/main/ncdio_pio.F90.in:469
#10 0x5af870 in __ncdio_pio_MOD_ncd_defvar_bygrid
at /glade/work/erik/ctsm_worktrees/dust_dev/src/main/ncdio_pio.F90.in:1257
#11 0x71c18b in __restutilmod_MOD_restartvar_1d_double
at /glade/work/erik/ctsm_worktrees/dust_dev/src/utils/restUtilMod.F90.in:325
#12 0xa3cf54 in __frictionvelocitymod_MOD_restart
at /glade/work/erik/ctsm_worktrees/dust_dev/src/biogeophys/FrictionVelocityMod.F90:443
This is in the same vein as #1913 and #144
Fixing this would just be adding dimexist options to the ncd_inqdid calls and check it.
This is something that should be done on b4b-dev. It's also the type of thing that having simple I/O testing would help with. So the functional test framework would be a good place for this to be tested in.