CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

Problems when asking for history variable at a higher subgrid level than available (add graceful error checking)

Open ekluzek opened this issue 3 years ago • 9 comments

Brief summary of bug

If you ask for a variable at a higher subgrid level than is available, the error message is not helpful. The output example of this is from Jon Wolfe.

General bug information

CTSM version you are using: ? but in latest ctsm ctsm5.1.dev114

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: BGC-Crop

Details of bug

This is specifically about what happens when you ask for CROPPROD1C at PFT level, when CROPPROD1C is a gridcell level variable so it's only available at gridcell level averaging. But, it's a general problem that applies to other variables as well.

Important details of your setup / configuration so we can reproduce the bug

 hist_dov2xy = .true.,.false.,.false.
 hist_fincl2 = 'CROPPROD1C'
 hist_fincl3 = 'CROPPROD1C'
 hist_nhtfrq = 0,0,0
 hist_type1d_pertape = ' ','PFTS','GRID'

Important output or errors that show the problem

The following must be what you see when running in DEBUG mode, there would still be a problem without DEBUG mode, but likely much messier.

The cesm log complains about this:

forrtl: severe (408): fort: (2): Subscript #1 of the array HBUF has value 15735 which is greater than the upper bound of 1970

Image              PC                Routine            Line        Source
cesm.exe           0000000004EF7F66  Unknown               Unknown  Unknown
cesm.exe           0000000000BC2B54  histfilemod_mp_hi        4318  histFileMod.F90
cesm.exe           0000000000D6A165  restfilemod_mp_re         124  restFileMod.F90
cesm.exe           00000000009051CF  clm_driver_mp_clm        1365  clm_driver.F90
cesm.exe           000000000089807F  lnd_comp_mct_mp_l         457  lnd_comp_mct.F90
cesm.exe           0000000000476A85  component_mod_mp_         737  component_mod.F90
cesm.exe           000000000043E2B1  cime_comp_mod_mp_        2626  cime_comp_mod.F90
cesm.exe           000000000045E5CC  MAIN__                    133  cime_driver.F90=
cesm.exe           0000000000416A12  Unknown               Unknown  Unknown

Definition of done:

  • [ ] Figure out how we can recognize this in the code
  • [ ] Add error checking so that the model gracefully fails in this case

ekluzek avatar Nov 23 '22 17:11 ekluzek

Thanks @ekluzek - that's a perfect description and I appreciate you opening an issue over the error message

jonbob avatar Nov 28 '22 18:11 jonbob

I actually had some issues with QIRRIG_FROM_SURFACE that turned out to be due to asking for PFT-level output when only column-level was available. But in my case, no error message was triggered. This happened with compset I2000Clm50BgcCrop, tag ctsm5.1.dev115.

user_nl_clm:

hist_fincl2 = 'QIRRIG_DEMAND', 'QIRRIG_DRIP', 'QIRRIG_FROM_GW_CONFINED', 'QIRRIG_FROM_GW_UNCONFINED', 'QIRRIG_FROM_SURFACE', 'QIRRIG_SPRINKLER'
hist_type1d_pertape(2) = 'PFTS'
hist_dov2xy(2) = .false.
hist_nhtfrq(2) = 0
hist_mfilt(2) = 1

samsrabin avatar Jan 10 '23 22:01 samsrabin

Thanks for reporting that ,,@samsrabin that's a bit disconcerting that it doesn't show an error. Although maybe that because it was a case with DEBUG false?

ekluzek avatar Jan 10 '23 22:01 ekluzek

Oh yeah, could be. DEBUG is indeed false.

samsrabin avatar Jan 10 '23 22:01 samsrabin

Do you have some output that shows what the problem looks like in the debug false case? Does it die on writing the data or does it take awhile? It would be good to show as much as you can so others can help diagnose the same problem.

ekluzek avatar Jan 10 '23 22:01 ekluzek

It actually just writes out fine. I only noticed because my postprocessing script had a sanity check that triggered on QIRRIG_FROM_SURFACE > QIRRIG_DEMAND. This only triggered on the PFT-level outputs, not for landunit- or gridcell-level.

samsrabin avatar Jan 10 '23 22:01 samsrabin

Yeah the thing is that it's going to be using garbage data internally so you can't trust that variable on the history file. Some of the data will be valid other data will be pulled from whatever it can find. It shouldn't disrupt anything else but it will be wrong. Although maybe it'll just be the data at the end that's wrong. So if you recognize that you might be ok.

ekluzek avatar Jan 10 '23 22:01 ekluzek

Right. My thought is that this should probably just throw a clear error, regardless of DEBUG setting, rather than potentially allowing bad data to be written. Ideally this error would be thrown during case.submit.

samsrabin avatar Jan 10 '23 22:01 samsrabin

A user reported garbage values due to this issue - see https://bb.cgd.ucar.edu/cesm/threads/mismatch-between-pft-and-grid-level-output.8572/ answered by @olyson .

billsacks avatar Aug 30 '23 00:08 billsacks