Problems with "S" compsets -- is it a valid case to run?
I wanted to get CTSM to work with SATM, and had problems, so I tried the "S" compset to see if it would work. And it doesn't. I ran into a simple problem with buildnml, and then a problem on submission.
But, it looks like stub components are configured so that they are marked as "invalid" and CESM is setup to run without a mediator when you only have one "valid" component. But, if "CPL" is the only valid component that's a contradiction.
Here's the buildnml problem I ran into:
The problem is at the end since coupling_times doesn't have an entry for cpl_cpl_dt.
./create_test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel -r .
Testnames: ['SMS_D_Ln1.f10_f10_mg37.S.derecho_intel']
Using project from .cesm_proj: P93300606
create_test will do up to 1 tasks simultaneously
create_test will use up to 160 cores simultaneously
Creating test directory /glade/derecho/scratch/erik/cesm3_0_alpha07a/cime/scripts/SMS_D_Ln1.f10_f10_mg37.S.derecho_intel.20250701_104712_kdjq6k
RUNNING TESTS:
SMS_D_Ln1.f10_f10_mg37.S.derecho_intel
Starting CREATE_NEWCASE for test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel with 1 procs
Finished CREATE_NEWCASE for test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel in 2.141925 seconds (PASS)
Starting XML for test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel with 1 procs
Finished XML for test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel in 0.605212 seconds (PASS)
Starting SETUP for test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel with 1 procs
Finished SETUP for test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel in 2.237583 seconds (PASS)
Starting SHAREDLIB_BUILD for test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel with 1 procs
Finished SHAREDLIB_BUILD for test SMS_D_Ln1.f10_f10_mg37.S.derecho_intel in 2.026433 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /glade/derecho/scratch/erik/cesm3_0_alpha07a/cime/scripts/SMS_D_Ln1.f10_f10_mg37.S.derecho_intel.20250701_104712_kdjq6k
Errors were:
Building test for SMS in directory /glade/derecho/scratch/erik/cesm3_0_alpha07a/cime/scripts/SMS_D_Ln1.f10_f10_mg37.S.derecho_intel.20250701_104712_kdjq6k
Traceback (most recent call last):
File
.
.
.
File "/glade/derecho/scratch/erik/cesm3_0_alpha07a/components/cmeps/cime_config/buildnml", line 506, in _create_drv_namelists
_create_runseq(case, coupling_times, valid_comps)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/derecho/scratch/erik/cesm3_0_alpha07a/components/cmeps/cime_config/buildnml", line 583, in _create_runseq
dtime = coupling_times[valid_comps[0].lower() + "_cpl_dt"]
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'cpl_cpl_dt'
Waiting for tests to finish
I got around that by arbitrarily getting dtime from SATM. But, then it fails at submit because minncpl is 0 and it gets a divide by zero here:
File "/glade/derecho/scratch/erik/cesm3_0_alpha07a/cime/CIME/case/case_submit.py", line 174, in _submit
case.check_case(skip_pnl=skip_pnl, chksum=chksum)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/glade/derecho/scratch/erik/cesm3_0_alpha07a/cime/CIME/case/case_submit.py", line 351, in check_case
timestep = 86400 / minncpl
~~~~~~^~~~~~~~~
ZeroDivisionError: division by zero
Which I can then get past by assigning: maxcomp = "ATM", and minncpl and maxncpl to the ATM values.
It then fails at runtime with a seg-fault:
dec2206.hsn.de.hpc.ucar.edu 125: forrtl: severe (174): SIGSEGV, segmentation fault occurred
dec2206.hsn.de.hpc.ucar.edu 125: Image PC Routine Line Source
dec2206.hsn.de.hpc.ucar.edu 125: libpthread-2.31.s 000014A52A75C8C0 Unknown Unknown Unknown
dec2206.hsn.de.hpc.ucar.edu 125: libmpi_intel.so.1 000014A5283993E1 Unknown Unknown Unknown
dec2206.hsn.de.hpc.ucar.edu 125: libmpi_intel.so.1 000014A5283997B8 Unknown Unknown Unknown
dec2206.hsn.de.hpc.ucar.edu 125: libmpi_intel.so.1 000014A52820BCBE Unknown Unknown Unknown
dec2206.hsn.de.hpc.ucar.edu 125: libmpi_intel.so.1 000014A526C8FE68 MPI_Abort Unknown Unknown
dec2206.hsn.de.hpc.ucar.edu 125: libesmf.so 000014A53251FCD2 abort 904 ESMCI_VMKernel.C
dec2206.hsn.de.hpc.ucar.edu 125: libesmf.so 000014A532519FE3 abort 3721 ESMCI_VM.C
dec2206.hsn.de.hpc.ucar.edu 125: libesmf.so 000014A532545E61 c_esmc_vmabort_ 1252 ESMCI_VM_F.C
dec2206.hsn.de.hpc.ucar.edu 125: libesmf.so 000014A5337B4A7B c_esmc_vmabort_.t 0 ESMF_VM.F90
dec2206.hsn.de.hpc.ucar.edu 125: libesmf.so 000014A5337AB39C esmf_vmabort 9525 ESMF_VM.F90
dec2206.hsn.de.hpc.ucar.edu 125: libesmf.so 000014A53337DDCC esmf_finalize 1712 ESMF_Init.F90
dec2206.hsn.de.hpc.ucar.edu 125: cesm.exe 000000000044702A MAIN__ 136 esmApp.F90
dec2206.hsn.de.hpc.ucar.edu 125: cesm.exe 000000000041FC5D Unknown Unknown Unknown
dec2206.hsn.de.hpc.ucar.edu 125: libc-2.31.so 000014A5262C929D __libc_start_main Unknown Unknown
dec2206.hsn.de.hpc.ucar.edu 125: cesm.exe 000000000041FB8A Unknown Unknown Unknown
Where line 136 is the last line here:
!-----------------------------------------------------------------------------
! Call Initialize for the earth system ensemble Component
!-----------------------------------------------------------------------------
call ESMF_GridCompInitialize(ensemble_driver_comp, userRc=urc, rc=rc)
if (ESMF_LogFoundError(rcToCheck=rc, msg=ESMF_LOGERR_PASSTHRU, &
line=__LINE__, &
file=__FILE__)) &
call ESMF_Finalize(endflag=ESMF_END_ABORT)
if (ESMF_LogFoundError(rcToCheck=urc, msg=ESMF_LOGERR_PASSTHRU, &
line=__LINE__, &
file=__FILE__)) &
call ESMF_Finalize(endflag=ESMF_END_ABORT)
This might mean that a "S" compset is just an invalid case that we shouldn't run. And if so we should remove it from the compsets, and also trap for it in the scripting to mark it as an invalid compset.
But, this is also the same point where I have an I compset with SATM fail, which I do want to get working for testing.
Using the CMEPS mediator has eliminated the need for Stub components, so there is no longer an SATM.