DART
DART copied to clipboard
Implementing CLM-DART within CESM2.3
Use case
Successfully run CLM compsets within DART using CESM2.3. Currently CLM-DART is vetted using release-cesm2.2.0, but the goal here is to anticipate software changes required for cesm2.3 tags.
Is your feature request related to a problem?
Related to reported challenges of implementing DART within CESM2.3 as reported by CMCC group. Specifically cmcc reports a shape mismatch error
when implementing fill_inflation_restart
or filter
. This could be related to structural differences between the clm_restart.nc
, clm_history.nc
and clm_vector_history.nc
files.
There are related issues to this one including the implementation of NUOPC within CESM2.3 and the impact on the CAM6-Reanalysis files. These related issues are here #474 and here #463.
Problem Description
New dimensions in cesm2.3.0 include levmaxurbgrnd = 25
, mxsowings =1
and mxharvests = 2
. levmaxurbgrnd
dimension replaces the levgrnd
dimension for some variables.
There are also many differences in restart file variables (history and vector_history only show numeric differences), but most of these are just diagnostic variables that should not be impactful on functioning on model_mod
. List of differences provided for reference.
I was not able to reproduce the error (all tests ran as expected) using mct driver implementations of cesm2.3.0 using the following tests below. Note: cesm2.3.0_beta08
was branch-off point for CMCC development. CMCC used. DART v10.7.0 was used for testing, cmcc was using DARTv10.5.3, but this older tag includes all important changes with CLM.
-
cesm2.3.0_beta12
output tested againstfill_inflation_restart
andmodel_mod_check
(1-4) -
cesm2.3.0_beta08
output tested againstfill_inflation_restart
andmodel_mod_check
(1-4) -
cesm2.3.0_beta12
output tested against CLM Tutorial setup for 1 assimilation time step -
cesm2.3.0_beta08
output tested against CLM Tutorial setup for 1 assimilation time step
Side note: cesm2.3.0 version required methane to be turned on (use_lch4= TRUE) if use_nitrif_denitrif = TRUE. This caused mass balance failure in subsequent model integration after initial DART update. Something to consider for cesm2.3.0 SourceMods.
Hi Brett, this could be a bug in how dart reads the state, e.g if some variables have unlimited and some don't, this would cause the counts length = 0 (instead of 1).
@hkershaw-brown So that possibility has been on my radar. In addition to the use of new dimensions I mentioned above, there is a 'cohort' dimension within the clm_restart.nc
file which has an explicit value for the cesm2.2.0 version, but for cesm2.3.0 the cohort dimension is defined as ``= UNLIMITED; // (0 currently). If that's the problem, I don't understand why my tests didn't fail -- the DART state I tested includes even more variables than what Gustavo was testing.
Hi Brett, I think what is going on here:
Assumptions in direct_netcdf_mod.f90:
- If there is an unlimited dimension, every variable has the unlimited dimension.
- The unlimited dimension is time, and we are interested in the latest time slice in a file.
In the cesm 2.3 clm, there is an unlimited dimension cohorts which is not used by any of the state variables (and, kind of strangely, not used at all)
So this line:
https://github.com/NCAR/DART/blob/1b76f3afa5978469d6a119710bb638c1c077ae20/assimilation_code/modules/io/direct_netcdf_mod.f90#L865 num_dims - 1 = 0
Gustavo's compiler is picking up this 1:0 and erroring out.
intel 19.1.1.21 (running on Cheyenne) just merrily continues, but the counts are incorrect. The count is set to 1 rather than the length of the dimension.
For fill_inflation_restart it does not matter if the state is read in incorrectly, since we just fill the output with (mean and sd (e.g. 1.0, 0.6). Side note here is we don't need to be calling read_state in fill_inflation_restart.
For filter, which does need to read the state, I think the read does not happen correctly (count is not the size of the variable) .
On write, only 'time' (lower case) can be the unlimited dimension. (This is another assumption in dart. It is a weakness in how we deal with unlimited dimensions (https://github.com/NCAR/DART/issues/359#issuecomment-1187818595)) For dart created files, dart won't add an unlimited dimension unless its name is 'time'. So creating and writing a file with fill inflation restart gets you a file with the correct info.
All the assumptions above are a bit of a hack.
I put a fix on https://github.com/NCAR/DART/tree/fix-unlimited_dim-read which checks the variable for unlimited dimension before adjusting the counts reading and writing. It is a narrow solution (it really just fixes the case where there is an unused unlimited dimension).
I'm not sure if the cohorts dimension will be used in the state, if it is then improving the state read/write to cope with various (and multiple) unlimited dimensions gets bumped up the priority list.
Let me know if this makes sense or not.
Thanks for looking at this @hkershaw-brown. I took another look and have a few responses and questions.
First, I was not able to recreate the dimension error mismatch during my initial tests because, as you suspected, the inflation_restart
step simply uses the restart files as a template to write with 1 and 0.6 values, and does not reading anything-- so this works OK. When performing the filter
update step, however, the code doesn't error out, but the posterior restart files produce 'garbage' values -- in this case the entire domain is filled with zeros.
I redid the test by merging in your updates for direct_netcdf_mod.f90
from branch https://github.com/NCAR/DART/tree/fix-unlimited_dim-read. I reran the CLM tutorial test with cesm2.3 output, and I got the expected behavior -- the posterior restart files gave realistic values, and the increments were localized around the synthetic observations. So this looks like it addresses the immediate issue.
Couple questions -- should I have been able to detect this dimension issue mismatch with my intel compiler settings? It seems the ability to catch this may be compiler specific, and not necessarily something that can be captured by a more stringent set of compiler options/flags? Is that your understanding? Also, it would seem, this fix is specific to the DART code, and the cmcc compiler with still fail unless the CLM dimensional values are changed, including the offending cohort
dimension.
It's not exactly clear to me what the role of the cohort dimension is in CLM -- I need to take a closer look at the documentation. Perhaps it's only used in certain compset configurations (CLM-FATES?) or is currently being used as a placeholder for further code development. We may need to get feedback from the CGD SE's to get more perspective on this.
Hi @braczka and @hkershaw-brown Thanks for the answers. I have also updates for direct_netcdf_mod.f90 from branch https://github.com/NCAR/DART/tree/fix-unlimited_dim-read and was able to pass fill_inflation_restart without the counting issue. The code is able to enter the filter however, it crashes now in another inflation related routine. I am not sure whether the inflation files were written correctly or it is a completely new issue. Please find below an extract of the log. For simplicity I am assimilating LAI only. Cheers!
PE 504: create_and_open_state_output Creating output file preassim_priorinf_
After computing prior observation values TIME: 2023/07/27 13:17:35
PE 0: filter trace: After computing prior observation values
PE 0: filter trace: Before preassim state space output
Before preassim state space output TIME: 2023/07/27 13:17:35
mean_d01.nc
PE 0: create_and_open_state_output Creating output file preassim_member_0001_d
01.nc
PE 288: create_and_open_state_output Creating output file preassim_member_00
05_d01.nc
PE 144: create_and_open_state_output Creating output file preassim_member_00
PE 432: create_and_open_state_output Creating output file preassim_sd_d01.nc
03_d01.nc
PE 576: create_and_open_state_output Creating output file preassim_priorinf_
PE 72: create_and_open_state_output Creating output file preassim_member_00
sd_d01.nc
02_d01.nc
PE 216: create_and_open_state_output Creating output file preassim_member_00
PE 360: create_and_open_state_output Creating output file preassim_mean_d01.
04_d01.nc
nc
PE 0: create_and_open_state_output Creating output file preassim_member_0001_d
02.nc
PE 72: create_and_open_state_output Creating output file preassim_member_00
02_d02.nc
PE 144: create_and_open_state_output Creating output file preassim_member_00
03_d02.nc
PE 216: create_and_open_state_output Creating output file preassim_member_00
04_d02.nc
PE 288: create_and_open_state_output Creating output file preassim_member_00
05_d02.nc
PE 360: create_and_open_state_output Creating output file preassim_mean_d02.
nc
PE 432: create_and_open_state_output Creating output file preassim_sd_d02.nc
PE 504: create_and_open_state_output Creating output file preassim_priorinf_
mean_d02.nc
PE 576: create_and_open_state_output Creating output file preassim_priorinf_
sd_d02.nc
PE 0: create_and_open_state_output Creating output file preassim_member_0001_d
03.nc
PE 72: create_and_open_state_output Creating output file preassim_member_00
02_d03.nc
PE 144: create_and_open_state_output Creating output file preassim_member_00
03_d03.nc
PE 216: create_and_open_state_output Creating output file preassim_member_00
04_d03.nc
PE 288: create_and_open_state_output Creating output file preassim_member_00
05_d03.nc
PE 360: create_and_open_state_output Creating output file preassim_mean_d03.
nc
PE 432: create_and_open_state_output Creating output file preassim_sd_d03.nc
PE 504: create_and_open_state_output Creating output file preassim_priorinf_
mean_d03.nc
PE 576: create_and_open_state_output Creating output file preassim_priorinf_
sd_d03.nc
After preassim state space output TIME: 2023/07/27 13:17:43
PE 0: filter trace: After preassim state space output
PE 0: filter trace: Before observation space diagnostics
PE 0: filter trace: After observation space diagnostics
PE 0: filter: Ready to assimilate up to 249723 observations
PE 0: filter trace: Before observation assimilation
Before observation assimilation TIME: 2023/07/27 13:17:44
PE 0: locations_mod Location module statistics:
PE 0: locations_mod Total boxes (nlon * nlat): 2556
PE 0: locations_mod Total items to put in boxes: 11215
PE 0: locations_mod Percent boxes with 1+ items: 42.18
PE 0: locations_mod Average #items per non-empty box: 10.40
PE 0: locations_mod Largest #items in one box: 63
PE 0: locations_mod Location module statistics:
PE 0: locations_mod Total boxes (nlon * nlat): 2556
PE 0: locations_mod Total items to put in boxes: 347
PE 0: locations_mod Percent boxes with 1+ items: 11.42
PE 0: locations_mod Average #items per non-empty box: 1.19
PE 0: locations_mod Largest #items in one box: 3
PE 0: comp_cov_factor: Standard Gaspari Cohn localization selected
Processing observation 1000 of 249723 TIME: 2023/07/27 13:17:45
Processing observation 2000 of 249723 TIME: 2023/07/27 13:17:45
Processing observation 3000 of 249723 TIME: 2023/07/27 13:17:45
Processing observation 4000 of 249723 TIME: 2023/07/27 13:17:45
Processing observation 5000 of 249723 TIME: 2023/07/27 13:17:45
Processing observation 6000 of 249723 TIME: 2023/07/27 13:17:45
Processing observation 7000 of 249723 TIME: 2023/07/27 13:17:46
Processing observation 8000 of 249723 TIME: 2023/07/27 13:17:46
Processing observation 9000 of 249723 TIME: 2023/07/27 13:17:46
Processing observation 10000 of 249723 TIME: 2023/07/27 13:17:47
Processing observation 11000 of 249723 TIME: 2023/07/27 13:17:47
Processing observation 12000 of 249723 TIME: 2023/07/27 13:17:47
Processing observation 13000 of 249723 TIME: 2023/07/27 13:17:47
Processing observation 14000 of 249723 TIME: 2023/07/27 13:17:48
Processing observation 15000 of 249723 TIME: 2023/07/27 13:17:48
Processing observation 16000 of 249723 TIME: 2023/07/27 13:17:48
Processing observation 17000 of 249723 TIME: 2023/07/27 13:17:48
Processing observation 18000 of 249723 TIME: 2023/07/27 13:17:49
Processing observation 19000 of 249723 TIME: 2023/07/27 13:17:49
Processing observation 20000 of 249723 TIME: 2023/07/27 13:17:49
Processing observation 21000 of 249723 TIME: 2023/07/27 13:17:49
Processing observation 22000 of 249723 TIME: 2023/07/27 13:17:50
Processing observation 23000 of 249723 TIME: 2023/07/27 13:17:50
Processing observation 24000 of 249723 TIME: 2023/07/27 13:17:50
Processing observation 25000 of 249723 TIME: 2023/07/27 13:17:50
Processing observation 26000 of 249723 TIME: 2023/07/27 13:17:50
Processing observation 27000 of 249723 TIME: 2023/07/27 13:17:51
Processing observation 28000 of 249723 TIME: 2023/07/27 13:17:51
Processing observation 29000 of 249723 TIME: 2023/07/27 13:17:51
Processing observation 30000 of 249723 TIME: 2023/07/27 13:17:51
Processing observation 31000 of 249723 TIME: 2023/07/27 13:17:51
Processing observation 32000 of 249723 TIME: 2023/07/27 13:17:52
Processing observation 33000 of 249723 TIME: 2023/07/27 13:17:52
Processing observation 34000 of 249723 TIME: 2023/07/27 13:17:52
Processing observation 35000 of 249723 TIME: 2023/07/27 13:17:52
Processing observation 36000 of 249723 TIME: 2023/07/27 13:17:52
Processing observation 37000 of 249723 TIME: 2023/07/27 13:17:52
Processing observation 38000 of 249723 TIME: 2023/07/27 13:17:52
Processing observation 39000 of 249723 TIME: 2023/07/27 13:17:53
Processing observation 40000 of 249723 TIME: 2023/07/27 13:17:53
Processing observation 41000 of 249723 TIME: 2023/07/27 13:17:53
Processing observation 42000 of 249723 TIME: 2023/07/27 13:17:53
Processing observation 43000 of 249723 TIME: 2023/07/27 13:17:53
Processing observation 44000 of 249723 TIME: 2023/07/27 13:17:54
Processing observation 45000 of 249723 TIME: 2023/07/27 13:17:54
Processing observation 46000 of 249723 TIME: 2023/07/27 13:17:54
Processing observation 47000 of 249723 TIME: 2023/07/27 13:17:54
Processing observation 48000 of 249723 TIME: 2023/07/27 13:17:54
Processing observation 49000 of 249723 TIME: 2023/07/27 13:17:54
Processing observation 50000 of 249723 TIME: 2023/07/27 13:17:54
Processing observation 51000 of 249723 TIME: 2023/07/27 13:17:54
Processing observation 52000 of 249723 TIME: 2023/07/27 13:17:55
Processing observation 53000 of 249723 TIME: 2023/07/27 13:17:55
Processing observation 54000 of 249723 TIME: 2023/07/27 13:17:56
Processing observation 55000 of 249723 TIME: 2023/07/27 13:17:56
Processing observation 56000 of 249723 TIME: 2023/07/27 13:17:57
Processing observation 57000 of 249723 TIME: 2023/07/27 13:17:57
Processing observation 58000 of 249723 TIME: 2023/07/27 13:17:58
Processing observation 59000 of 249723 TIME: 2023/07/27 13:17:58
Processing observation 60000 of 249723 TIME: 2023/07/27 13:17:59
Processing observation 61000 of 249723 TIME: 2023/07/27 13:18:00
Processing observation 62000 of 249723 TIME: 2023/07/27 13:18:01
Processing observation 63000 of 249723 TIME: 2023/07/27 13:18:02
Processing observation 64000 of 249723 TIME: 2023/07/27 13:18:02
Processing observation 65000 of 249723 TIME: 2023/07/27 13:18:03
Processing observation 66000 of 249723 TIME: 2023/07/27 13:18:04
Processing observation 67000 of 249723 TIME: 2023/07/27 13:18:05
Processing observation 68000 of 249723 TIME: 2023/07/27 13:18:06
Processing observation 69000 of 249723 TIME: 2023/07/27 13:18:07
Processing observation 70000 of 249723 TIME: 2023/07/27 13:18:08
Processing observation 71000 of 249723 TIME: 2023/07/27 13:18:09
Processing observation 72000 of 249723 TIME: 2023/07/27 13:18:10
Processing observation 73000 of 249723 TIME: 2023/07/27 13:18:11
Processing observation 74000 of 249723 TIME: 2023/07/27 13:18:12
Processing observation 75000 of 249723 TIME: 2023/07/27 13:18:12
Processing observation 76000 of 249723 TIME: 2023/07/27 13:18:13
Processing observation 77000 of 249723 TIME: 2023/07/27 13:18:14
Processing observation 78000 of 249723 TIME: 2023/07/27 13:18:15
Processing observation 79000 of 249723 TIME: 2023/07/27 13:18:16
Processing observation 80000 of 249723 TIME: 2023/07/27 13:18:16
Processing observation 81000 of 249723 TIME: 2023/07/27 13:18:17
Processing observation 82000 of 249723 TIME: 2023/07/27 13:18:18
Processing observation 83000 of 249723 TIME: 2023/07/27 13:18:19
Processing observation 84000 of 249723 TIME: 2023/07/27 13:18:19
Processing observation 85000 of 249723 TIME: 2023/07/27 13:18:20
Processing observation 86000 of 249723 TIME: 2023/07/27 13:18:21
Processing observation 87000 of 249723 TIME: 2023/07/27 13:18:22
Processing observation 88000 of 249723 TIME: 2023/07/27 13:18:23
Processing observation 89000 of 249723 TIME: 2023/07/27 13:18:23
Processing observation 90000 of 249723 TIME: 2023/07/27 13:18:24
Processing observation 91000 of 249723 TIME: 2023/07/27 13:18:25
Processing observation 92000 of 249723 TIME: 2023/07/27 13:18:26
Processing observation 93000 of 249723 TIME: 2023/07/27 13:18:26
Processing observation 94000 of 249723 TIME: 2023/07/27 13:18:27
Processing observation 95000 of 249723 TIME: 2023/07/27 13:18:28
Processing observation 96000 of 249723 TIME: 2023/07/27 13:18:29
Processing observation 97000 of 249723 TIME: 2023/07/27 13:18:29
Processing observation 98000 of 249723 TIME: 2023/07/27 13:18:30
Processing observation 99000 of 249723 TIME: 2023/07/27 13:18:31
Processing observation 100000 of 249723 TIME: 2023/07/27 13:18:32
Processing observation 101000 of 249723 TIME: 2023/07/27 13:18:33
Processing observation 102000 of 249723 TIME: 2023/07/27 13:18:34
Processing observation 103000 of 249723 TIME: 2023/07/27 13:18:35
Processing observation 104000 of 249723 TIME: 2023/07/27 13:18:35
Processing observation 105000 of 249723 TIME: 2023/07/27 13:18:36
Processing observation 106000 of 249723 TIME: 2023/07/27 13:18:37
Processing observation 107000 of 249723 TIME: 2023/07/27 13:18:38
Processing observation 108000 of 249723 TIME: 2023/07/27 13:18:39
Processing observation 109000 of 249723 TIME: 2023/07/27 13:18:40
Processing observation 110000 of 249723 TIME: 2023/07/27 13:18:41
Processing observation 111000 of 249723 TIME: 2023/07/27 13:18:41
Processing observation 112000 of 249723 TIME: 2023/07/27 13:18:42
Processing observation 113000 of 249723 TIME: 2023/07/27 13:18:43
Processing observation 114000 of 249723 TIME: 2023/07/27 13:18:44
Processing observation 115000 of 249723 TIME: 2023/07/27 13:18:45
Processing observation 116000 of 249723 TIME: 2023/07/27 13:18:46
Processing observation 117000 of 249723 TIME: 2023/07/27 13:18:47
Processing observation 118000 of 249723 TIME: 2023/07/27 13:18:48
Processing observation 119000 of 249723 TIME: 2023/07/27 13:18:49
Processing observation 120000 of 249723 TIME: 2023/07/27 13:18:50
Processing observation 121000 of 249723 TIME: 2023/07/27 13:18:51
Processing observation 122000 of 249723 TIME: 2023/07/27 13:18:52
Processing observation 123000 of 249723 TIME: 2023/07/27 13:18:53
Processing observation 124000 of 249723 TIME: 2023/07/27 13:18:54
Processing observation 125000 of 249723 TIME: 2023/07/27 13:18:55
Processing observation 126000 of 249723 TIME: 2023/07/27 13:18:56
Processing observation 127000 of 249723 TIME: 2023/07/27 13:18:57
Processing observation 128000 of 249723 TIME: 2023/07/27 13:18:58
Processing observation 129000 of 249723 TIME: 2023/07/27 13:18:59
Processing observation 130000 of 249723 TIME: 2023/07/27 13:19:00
Processing observation 131000 of 249723 TIME: 2023/07/27 13:19:01
Processing observation 132000 of 249723 TIME: 2023/07/27 13:19:01
Processing observation 133000 of 249723 TIME: 2023/07/27 13:19:02
Processing observation 134000 of 249723 TIME: 2023/07/27 13:19:02
[n109:1529319:0:1529319] Caught signal 8 (Floating point exception: floating-point overflow)
==== backtrace (tid:1529319) ====
0 0x0000000000012ce0 __funlockfile() :0
1 0x000000000041f49c __libm_pow_e7() ???:0
2 0x00000000005a4696 adaptive_inflate_mod_mp_enh_compute_new_density_() /work/csp/lg07622/spreads/land/DART/assimilation_code/modules/assimilation/adaptive_inflate_mod.f90:1051
3 0x00000000005a3ca1 adaptive_inflate_mod_mp_bayes_cov_inflate_() /work/csp/lg07622/spreads/land/DART/assimilation_code/modules/assimilation/adaptive_inflate_mod.f90:932
4 0x00000000005a2cfd adaptive_inflate_mod_mp_update_inflation_() /work/csp/lg07622/spreads/land/DART/assimilation_code/modules/assimilation/adaptive_inflate_mod.f90:637
5 0x00000000005a342e adaptive_inflate_mod_mp_update_varying_state_space_inflation_() /work/csp/lg07622/spreads/land/DART/assimilation_code/modules/assimilation/adaptive_inflate_mod.f90:755
6 0x00000000005474c5 assim_tools_mod_mp_filter_assim_() /work/csp/lg07622/spreads/land/DART/assimilation_code/modules/assimilation/assim_tools_mod.f90:716
7 0x0000000000516826 filter_mod_mp_filter_main_() /work/csp/lg07622/spreads/land/DART/assimilation_code/modules/assimilation/filter_mod.f90:885
8 0x000000000050ee77 MAIN__() /work/csp/lg07622/spreads/land/DART/assimilation_code/programs/filter/filter.f90:20
9 0x000000000040e262 main() ???:0
10 0x000000000003acf3 __libc_start_main() ???:0
11 0x000000000040e16e _start() ???:0
=================================
forrtl: error (75): floating point exception
Image PC Routine Line Source
filter 000000000080C64B Unknown Unknown Unknown
libpthread-2.28.s 0000150399F82CE0 Unknown Unknown Unknown
libhdf5.so.200.1. 000015039D51E49C Unknown Unknown Unknown
filter 00000000005A4696 adaptive_inflate_ 1051 adaptive_inflate_mod.f90
filter 00000000005A3CA1 adaptive_inflate_ 932 adaptive_inflate_mod.f90
filter 00000000005A2CFD adaptive_inflate_ 637 adaptive_inflate_mod.f90
filter 00000000005A342E adaptive_inflate_ 755 adaptive_inflate_mod.f90
filter 00000000005474C5 assim_tools_mod_m 716 assim_tools_mod.f90
filter 0000000000516826 filter_mod_mp_fil 885 filter_mod.f90
filter 000000000050EE77 MAIN__ 20 filter.f90
filter 000000000040E262 Unknown Unknown Unknown
libc-2.28.so 0000150399863CF3 __libc_start_main Unknown Unknown
filter 000000000040E16E Unknown Unknown Unknown
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 1529319 RUNNING AT n109-ibj
= EXIT CODE: 134
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:[email protected]] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[proxy:0:[email protected]] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:[email protected]] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[[email protected]] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:75): one of the processes terminated badly; aborting
[[email protected]] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:22): launcher returned error waiting for completion
[[email protected]] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:215): launcher returned error waiting for completion
[[email protected]] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion
Thu Jul 27 13:19:06 CEST 2023 -- END FILTER
unlink: cannot unlink 'dart_posterior.nc': No such file or directory
'dart_posterior.nc' -> 'clm2_0001.r.2011-01-09-00000.nc'
'clm_restart.nc' -> 'clm5_gswp.clm2_0001.r.2011-01-09-00000.nc'
ERROR: dart_to_clm failed for clm5_gswp.clm2_0001.r.2011-01-09-00000.nc
Hi @tavicoaz -- thank you for the feedback, it is not immediately clear to me yet if the error is related to the inflation file or specific to the cesm2.3 formatting. In the interest of keeping this issue uncluttered could you post your exact same question to the DART help email (DART(at)ucar.edu) and we can better address it there.
Update on this: after discussions with @tavicoaz this is not an immediate issue. Switched to troubleshooting to cesm2.2 since the previous comment. May revisit this later on when cesm2.3 becomes a priority, so should keep this issue open for now.
note the fix for Fix for IO for NetCDF files when only some variables have the unlimited dimension (was on branch fix-unlimited_dim-read) is in the main branch of dart as on v11.8