CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

FAIL MKSURFDATAESMF_P128x1.f10_f10_mg37.I1850Clm50BgcCrop.derecho_intel NLCOMP

Open slevis-lmwg opened this issue 4 months ago • 10 comments

Originally posted by @samsrabin in https://github.com/ESCOMP/CTSM/issues/3422#issuecomment-3230045515

@jedwards4b I think this is related to the problem you alerted me to with ctsm6.1.112.

Generating a MKSURFDATAESMF_P128x1.f10_f10_mg37.I1850Clm50BgcCrop.derecho_intel baseline at ctsm5.3.069 gives CaseDocs/lnd_in with fsurdat pointing at a file generated during that test (ending in cYYMMDD.nc, where YYMMDD is today's date). Reverting CIME to cime6.1.111 changes it so that the baseline namelist file is saved earlier: CaseDocs/lnd_in has fsurdat pointing at the default file, ending in c240908.nc.

So cime6.1.112 causes the baseline namelist files to be saved later in the process. I think this is actually correct for this particular test, because the test does actually run with the file it generates. But it looks like the namelist comparison is still happening before the lnd_in namelist has been updated, so it still has the default c240908.nc file.

Fixing this test would require a CIME update that makes it so the namelist comparison happens after the lnd_in namelist has been updated—ideally, immediately before the baseline is saved. (We would then end up with a different problem, like what we see for the SUBSETDATA tests here, but that's easily fixable.)

Like I said, I think the baseline namelist files being saved later in the process is correct for this particular test. I don't know if that also applies for the E3SM test @jgfouca highlighted as having a related (?) problem.

slevis-lmwg avatar Aug 28 '25 18:08 slevis-lmwg

I labeled "bfb" and "testing" because the test fails ONLY in NLCOMP.

slevis-lmwg avatar Aug 28 '25 18:08 slevis-lmwg

The NLCOMP message appears in this post.

slevis-lmwg avatar Aug 28 '25 18:08 slevis-lmwg

It looks like the problem is that you can specify the namelist filename -- but NOT the surface dataset or log file name.

gen_mksurfdata_namelist.py: has this code:

    prefix = f"surfdata_{res}_{ssp_rcp_name}_{start_year}_{num_pft}pfts_c{time_stamp}."

    if args.namelist_fname is None:
        nlfname = f"{prefix}namelist"
    else:
        nlfname = args.namelist_fname

    fsurdat = f"{prefix}nc"
    fsurlog = f"{prefix}log"

One way to handle it would be to have "prefix" be the thing that's entered in the command line. And then you'd have something like this:

     if args.namelist_prefix is None:
        prefix = args.namelist_prefix
    else:
        prefix = f"surfdata_{res}_{ssp_rcp_name}_{start_year}_{num_pft}pfts_c{time_stamp}."
        
    nlfname = f"{prefix}namelist"
    fsurdat = f"{prefix}nc"
    fsurlog = f"{prefix}log"

And then the same prefix would be used for all three files. It does mean that the namelist file will have an assumed ".namelist" ending on it -- but I think that would be OK.

I think this approach for the prefix would be the best, but I can image other options as well....

ekluzek avatar Aug 29 '25 22:08 ekluzek

@ekluzek , can you check to see if https://github.com/ESMCI/cime/pull/4846 fixes this?

jgfouca avatar Aug 29 '25 23:08 jgfouca

@ekluzek , can you check to see if ESMCI/cime#4846 fixes this?

@samsrabin and @slevis-lmwg I think you showed that the above change in cime, fixes this for the subset data issue right? Does it fix it for mksurfdataesmf?

ekluzek avatar Aug 29 '25 23:08 ekluzek

Check in progress.

samsrabin avatar Aug 29 '25 23:08 samsrabin

It looks like the problem is that you can specify the namelist filename -- but NOT the surface dataset or log file name.

gen_mksurfdata_namelist.py: has this code:

prefix = f"surfdata_{res}_{ssp_rcp_name}_{start_year}_{num_pft}pfts_c{time_stamp}."

if args.namelist_fname is None:
    nlfname = f"{prefix}namelist"
else:
    nlfname = args.namelist_fname

fsurdat = f"{prefix}nc"
fsurlog = f"{prefix}log"

One way to handle it would be to have "prefix" be the thing that's entered in the command line. And then you'd have something like this:

 if args.namelist_prefix is None:
    prefix = args.namelist_prefix
else:
    prefix = f"surfdata_{res}_{ssp_rcp_name}_{start_year}_{num_pft}pfts_c{time_stamp}."
    
nlfname = f"{prefix}namelist"
fsurdat = f"{prefix}nc"
fsurlog = f"{prefix}log"

And then the same prefix would be used for all three files. It does mean that the namelist file will have an assumed ".namelist" ending on it -- but I think that would be OK.

I think this approach for the prefix would be the best, but I can image other options as well....

This is not necessary; the CIME PR makes it so the baseline namelist is saved earlier (i.e., at the same place it was saved before cime 112), so it will now always save with the default fsurdat (not the one generated during the gen_mksurfdata scripts).

samsrabin avatar Aug 30 '25 00:08 samsrabin

Hmmm, oddly we see this passing in what will become ctsm5.4.002. While it failed in the previous tag. And the changes in ctsm5.4.002 are unrelated, so we don't have an explanation as to why it's now passing.

ekluzek avatar Dec 09 '25 19:12 ekluzek

I find the same when I compare post-5.4.n02.ctsm5.4.000 (passes) versus post-5.4.n01.alpha-ctsm5.4.CMIP7.21.ctsm5.3.085 (fails) in PR #3643 where the changes are also unrelated.

slevis-lmwg avatar Dec 09 '25 19:12 slevis-lmwg

I considered removing the failure from the expected ones in PR #3643, but given the current inability to attribute this behavior, I will not.

slevis-lmwg avatar Dec 09 '25 19:12 slevis-lmwg