DART freezed before computing prior observation values
Hi everyone,
I am using WACCM+DART and can successfully run the model to assimilate MLS temperature. However, sometimes DART gets stuck at the "before computing prior observation values" stage, as indicated in the da.log file. Other times, the run completes without issues.
I am not sure what's causing this inconsistency. Has anyone else encountered this problem and can offer some advice?
Error Message
Sat Jul 27 23:09:09 CST 2024 -- BEGIN CAM_ASSIMILATE
valid time of model is 2012 12 5 21600 (seconds)
valid time of model is 2012 12 5 6 (hours)
/usr/bin/ls: cannot access ../Hide*: No such file or directory
most recent log is cesm.log.14394469.240727-230022
oldest log is cesm.log.14394469.240727-230022
entire log list is cesm.log.14394469.240727-230022
Sat Jul 27 23:09:15 CST 2024 -- BEGIN COPY BLOCK
Sat Jul 27 23:09:15 CST 2024 -- END COPY BLOCK
stages_except_output = {}
stages_all = {,output}
OBS_FILE = /data2/share/elzd_2024_000125/xhw/downloadData/MLS/2013/Level2/DART_seq/201212_6H_CESM/obs_seq.2012-12-05-21600
inf_flavor(1) = 5, using namelist values.
Posterior Inflation not requested for this assimilation.
Sat Jul 27 23:09:15 CST 2024 -- BEGIN FILTER
srun: ROUTE: split_hostlist: hl=a3106n05,a3401n[10,13,16],a3405n01,a3408n13,b2104r4n[3,6],b3202r2n[3,5,7],b3202r3n1,b3306r8n[1,3],b3309r3n[5,7] tree_width 0
srun: ROUTE: split_hostlist: hl=b3309r3n6 tree_width 0
srun: ROUTE: split_hostlist: hl=a3106n06,a3401n09 tree_width 0
srun: ROUTE: split_hostlist: hl=b3202r3n2 tree_width 0
srun: ROUTE: split_hostlist: hl=b3202r2n4 tree_width 0
srun: ROUTE: split_hostlist: hl=b3202r2n6 tree_width 0
srun: ROUTE: split_hostlist: hl=a3405n[02-03] tree_width 0
srun: ROUTE: split_hostlist: hl=b2104r4n7,b3202r2n2 tree_width 0
srun: ROUTE: split_hostlist: hl=b2104r4n[4-5] tree_width 0
srun: ROUTE: split_hostlist: hl=a3401n[14-15] tree_width 0
srun: ROUTE: split_hostlist: hl=b3306r8n2 tree_width 0
srun: ROUTE: split_hostlist: hl=b3202r2n8 tree_width 0
srun: ROUTE: split_hostlist: hl=b3306r8n4 tree_width 0
srun: ROUTE: split_hostlist: hl=b3309r3n8 tree_width 0
srun: ROUTE: split_hostlist: hl=a3401n[11-12] tree_width 0
srun: ROUTE: split_hostlist: hl=a3401n[17-18] tree_width 0
srun: ROUTE: split_hostlist: hl=a3408n[14-15] tree_width 0
--------------------------------------
Starting ... at YYYY MM DD HH MM SS =
2024 7 27 23 9 19
Program Filter
--------------------------------------
set_nml_output Echo NML values to log file only
PE 0: initialize_mpi_utilities: Running with 2560 MPI processes.
Assimilate_these_obs_types:
AURAMLS_TEMPERATURE
Evaluate_these_obs_types:
none
Use the precomputed Prior Forward Operators for these obs types:
none
PE 0: location_mod: using code with optimized cutoffs
PE 0: location_mod: Including vertical separation when computing distances:
PE 0: location_mod: # pascals ~ 1 horiz radian: 20000.00000
PE 0: location_mod: # meters ~ 1 horiz radian: 10000.00000
PE 0: location_mod: # model levels ~ 1 horiz radian: 20.00000
PE 0: location_mod: # scale heights ~ 1 horiz radian: 1.50000
PE 0: location_mod: Using table-lookup approximation for distance computations
PE 0: init_discard_high_obs Discarding observations higher than model level
5
PE 0: init_discard_high_obs ... which is equivalent to pressure level 0.44041
E-02 Pascals
PE 0: init_discard_high_obs ... which is equivalent to height 114178.4
9250 meters
PE 0: init_discard_high_obs ... which is equivalent to scale height 16.9
3814
PE 0: quality_control_mod: Will reject obs with Data QC larger than 3
PE 0: quality_control_mod: Will reject obs values more than 3.000000 sigma f
rom mean
PE 0: init_algorithm_info_mod: No QCF table file listed in namelist, using def
ault values for all QTYs
PE 0: assim_tools_init: The cutoff namelist value is 0.150000
PE 0: assim_tools_init: ... cutoff is the localization half-width parameter,
PE 0: assim_tools_init: ... so the effective localization radius is 0
.300000
PE 0: assim_tools_init: Using Sampling Error Correction
PE 0: assim_tools_init: Replicating a copy of the ensemble mean on every task
PE 0: assim_tools_init: ... uses more memory per task but may run faster if doi
ng vertical
PE 0: assim_tools_init: ... coordinate conversion; controlled by namelist item
"distribute_mean"
PE 0: assim_tools_init: Doing vertical localization, vertical coordinate conver
sion may be required
PE 0: assim_tools_init: ... Converting all state vector verticals to localizati
on coordinate first.
PE 0: assim_tools_init: ... Converting all observation verticals to localizatio
n coordinate first.
PE 0: filter trace: Filter start
Filter start TIME: 2024/07/27 23:09:21
PE 0: filter_main: running with an ensemble size of 5
PE 0: filter trace: Before initializing inflation
PE 0: filter_main: Prior inflation damping of 0.900000 will be used
PE 0: filter trace: After initializing inflation
PE 0: parse_stages_to_write: filter will write stage : output
PE 0: filter trace: Before setting up space for observations
Before setting up space for observations TIME: 2024/07/27 23:09:21
After setting up space for observations TIME: 2024/07/27 23:09:21
PE 0: filter trace: After setting up space for observations
PE 0: filter trace: Before setting up space for ensembles
PE 0: filter_main: running with distributed state; model states stay distribute
d across all tasks for the entire run
PE 0: filter trace: After setting up space for ensembles
PE 0: filter trace: Before reading in ensemble restart files
Before reading in ensemble restart files TIME: 2024/07/27 23:09:21
PE 0: Prior inflation: deterministic, deflation permitted, enhanced time-adapti
ve, time-rate adaptive, spatially-varying, state-space
PE 0: Prior inflation: inf mean from namelist, value: 1.000
PE 0: Prior inflation: inf stddev from namelist, value: 0.600
PE 0: Prior inflation: inf stddev max change: 1.050
PE 0: Posterior inflation: None
PE 0: filter_main: Reading in initial condition/restart data for all ensemble m
embers from file(s)
After reading in ensemble restart files TIME: 2024/07/27 23:09:28
PE 0: filter trace: After reading in ensemble restart files
PE 0: filter trace: Before initializing output files
Before initializing output files TIME: 2024/07/27 23:09:28
After initializing output files TIME: 2024/07/27 23:09:28
PE 0: filter trace: After initializing output files
PE 0: filter trace: Before trimming obs seq if start/stop time specified
PE 0: filter trace: After trimming obs seq if start/stop time specified
PE 0: filter trace: Top of main advance time loop
PE 0:
PE 0: filter: Main assimilation loop, starting iteration 0
PE 0: filter trace: Before move_ahead checks time of data and next obs
PE 0: shortest_time_between_assimilations: assimilation period is 0
days 21600 seconds
PE 0: move_ahead Current model data time is: day= 150453 sec= 2160
0
PE 0: move_ahead Current assimilation window starts at: day= 150453 sec= 1080
1
PE 0: move_ahead Next available observation time is: day= 150453 sec= 1082
4
PE 0: move_ahead Current assimilation window ends at: day= 150453 sec= 3240
0
PE 0: shortest_time_between_assimilations: assimilation period is 0
days 21600 seconds
PE 0: move_ahead Next available observation time is: day= 150453 sec= 1082
4
PE 0: move_ahead Within current assimilation window, model does not need advanc
e.
PE 0: move_ahead Next assimilation window contains up to 35670 observations
PE 0: filter trace: After move_ahead checks time of data and next obs
PE 0: filter: Model does not need to run; data already at required time
PE 0: filter trace: Before setup for next group of observations
PE 0: filter trace: Number of observations to be assimilated 35670
filter trace: Time of first observation in window day=150453, sec=10824
filter trace: Time of last observation in window day=150453, sec=32384
PE 0: filter trace: After setup for next group of observations
PE 0: filter trace: Before prior inflation damping and prep
PE 0: filter trace: After prior inflation damping and prep
PE 0: filter trace: Before computing prior observation values
Before computing prior observation values TIME: 2024/07/27 23:09:28
Which model(s) are you working with?
WACCM
Version of DART
Which version of DART are you using? v11.5.1
Have you modified the DART code?
No
Build information
Please describe:
- The machine you are running on (e.g. windows laptop, NSF NCAR supercomputer Derecho).
cluster - The compiler you are using (e.g. gnu, intel) Intel
Hello, thanks for reaching out. Could you please share with us the contents of the input.nml file you are using? @664787022
And for some initial suggestions, I would try running with more ensemble members (maybe 50 instead); 5 is a very small ensemble and has caused this issue to occur in the past.
Are you using a different number of MPI processes across your runs, or are you consistently running on 2560?
I would also recommend submitting this issue to the email [email protected] for support. This is where we handle user support requests as opposed to creating issues on GitHub, which are more for explicit bugs or feature requests.
Hi @664787022 closing this issue since we have not heard from you. Please email [email protected] if you are still having problems. Cheers, Helen