CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

FAIL LII2* COMPARE_base_no_interp due to fields VEGWP[LN,PD] starting some time after ctsm5.3.059

Open slevis-lmwg opened this issue 1 month ago • 2 comments

ncview helps show the diffs in VEGWPLN (diffs in VEGWPPD are more subtle) with file1 (use_init_interp = .true.) on the left and file 2 on the right (no_interp):

file 1=
 /glade/derecho/scratch/slevis/LII2FINIDATAREAS_D_P256x2_Ld1.f09_t232.I1850Clm60 BgcCrop.derecho_intel.clm-default.C.20251118_162221_1yxdza/run/LII2FINIDATAREAS
 _D_P256x2_Ld1.f09_t232.I1850Clm60BgcCrop.derecho_intel.clm-default.C.20251118_1
 62221_1yxdza.clm2.h0a.0001-01-02-00000.nc.base
 file 2= 
 /glade/derecho/scratch/slevis/LII2FINIDATAREAS_D_P256x2_Ld1.f09_t232.I1850Clm60 BgcCrop.derecho_intel.clm-default.C.20251118_162221_1yxdza/run/LII2FINIDATAREAS
 _D_P256x2_Ld1.f09_t232.I1850Clm60BgcCrop.derecho_intel.clm-default.C.20251118_1
 62221_1yxdza.clm2.h0a.0001-01-02-00000.nc.no_interp
Image

Originally posted by @slevis-lmwg in #3252

HISTORY

  • After ctsm5.3.059, LII2* started failing in RUN due to new finidat vs. old fsurdat (2025/6/24).
  • The failure also occurred on the 5.4 branch in reverse, i.e. due to old finidat vs. new fsurdat as documented in #3252 (2025/6/12).
  • The tests still fail in RUN on master (ctsm5.3.085) while the 5.4 branch remains in development. The tests stopped failing in RUN on the 5.4 branch when we updated to a consistent finidat/fsurdat pair, but proceeded to fail as expected in COMPARE_base_no_interp as explained in https://github.com/ESCOMP/CTSM/issues/3252#issuecomment-3192993166.
  • We expected to fix the COMPARE_base_no_interp failure just before finalizing ctsm5.4 by following the instructions in the linked comment (just above). Instead we came across the problem with VEGWP[LN,PD] shown with ncview above.

WHAT ELSE WE KNOW

  • In discussing this with @olyson and @wwieder, Keith and I decided I should remove the two fields from restart. This enabled the LII2 tests to pass as explained in https://github.com/ESCOMP/CTSM/issues/3252#issuecomment-3554194101.
  • However, now restart tests fail (e.g. ERS) suggesting that the two fields do need to be in restart for some reason.
  • A lead (or a red herring?): Keith found that local-noon (LN) radiation fields do not go to restart, but they also get reset to spval differently than VEGWPLN.
  • In another conversation about this, @ekluzek wondered whether the VEGWP[LN,PD] and the local-noon radiation fields should both be in restart, so as to restart correctly in sub-daily restarts.
  • A detail that may implicate me in the source of the problem: I brought the h0a/h0i split to master in ctsm5.3.062, which may have broken these two fields while the LII2 tests were already labeled as expected failures.
  • Keith tells me that the VEGWP[LN,PD] code was introduced by @djk2120, so I'm pinging Daniel here to keep you in the loop.

CURRENT PATH FORWARD I got consensus from @olyson @ekluzek @wwieder to proceed with

  • putting the VEGWP[LN,PD] fields back on restart
  • removing from history and
  • opening this issue

so as to proceed with ctsm5.4 work and resolve this at a later time.

slevis-lmwg avatar Nov 20 '25 23:11 slevis-lmwg

I can explain the cutting off at the top.

I was rethinking the bit about needing these type of fields on restart. Since, they are only set when near_local_noon or predawn and then set to spval you could reproduce it on restart with doing the same check. But, you also shouldn't need to as it should be valid after the first time-step is run.

However, the bit about this only being done when daytime matters -- because that means it won't get set to the previous value at restart -- as you won't get to the code where it's set. So it'll keep the spval that was set for it with the hist_addfld call. So the data where it was night at restart will retain the spval and be blanked out as we see in the map.

The underlying problem is that the logic in Photosynthesis is complicated enough and often only done during the day, so that even if you have a


   if ( <something> ) then
      <do this>
  else
      <do thiat>
  end if

Neither branch may get executed, when it's easy to think that the above would cover everything. There are places in Photosynthesis where some variables are explicitly set to zero for nighttime, and this code should have something similar for it. I assume that it should be spval for nighttime? @djk2120 what do you think these should be set to during nighttime?

ekluzek avatar Nov 21 '25 18:11 ekluzek

@djk2120 confirmed with me that we should change this so that they are set to spval for nighttime. Doing this should resolve the restart issue even when these fields are output.

ekluzek avatar Dec 11 '25 19:12 ekluzek