DART
DART copied to clipboard
bug: fix bounds fix_bound_violations = .true. seems to be required for ifort
:bug: Your bug may already be reported! Please search on the issue tracker before creating a new issue.
Describe the bug
- List the steps someone needs to take to reproduce the bug.
/glade/derecho/scratch/hkershaw/DART/Bugs/bgunn_qceff/DART/models/lorenz_96_tracer_advection/work Following https://github.com/NCAR/DART/blob/l96_tracer_tests/models/lorenz_96_tracer_advection/work/TESTS/TEST_DRIVER.csh reported by Ben Gunn: (thanks @Benjamin-Gunn !) https://github.com/Benjamin-Gunn/DART/blob/l96_tracer_tests/models/lorenz_96_tracer_advection/work/TESTS/TEST_DRIVER.csh
qceff_table_filename = 'one_below_qceff_table.csv'
&filter_nml inf_flavor = 5, 5,
&model_nml model_size = 120, forcing = 8.0, delta_t = 0.05, mean_velocity = 0.0, pert_velocity_multiplier = 5.0, diffusion_coef = 0.0, e_folding = 0.25, sink_rate = 0.1, source_rate = 100.0, point_tracer_source_rate = 5.0, positive_tracer = .false., bound_above_is_one = .true., time_step_days = 0, time_step_seconds = 3600, /
-
What was the expected outcome? not expected
fix_bound_violations = .true.
to be required so often. -
What actually happened?
Failures for "Ensemble member greater than upper bound first check" at various pe counts.
You can set:
&probit_transform_nml fix_bound_violations = .true. /
however, you still get different answers across mpi counts.
#!/bin/bash
module load nco
rm -f one_var_temp.nc
ncrcat -d location,1,1 filter_output.nc one_var_temp.nc
ncks -V -C -v state_variable_mean one_var_temp.nc | tail -3 | head -1 >> test_output
rm -f one_var_temp.nc
varying pe count: 7.95979093017264 ; 8.02126025256388 ; 8.55748257662756 ;
varying pe count with -fp-model-precise 8.62082489125036 ; 8.62082489125036 ; 8.62082489125036 ;
not sure how different is ok with the varying pe count. Note: I cannot reproduce the bounds violations with -fp-model-precise
Todo @hkershaw intel/2024.0.2, ifx, vs gfortran
Error Message
3 mpi tasks: (also happens with 8,7 (without post_inf), 40(without post_inf))
PE 0: comp_cov_factor: Standard Gaspari Cohn localization selected
ERROR FROM:
source : bnrh_distribution_mod.f90
routine: bnrh_cdf_initialized
message: Ensemble member greater than upper bound first check(see code) 1.00000000000000 1.00000000000000
MPICH ERROR [Rank 0] [job id e35a8d7d-258f-45c5-8d80-ba05433b0be5] [Tue Aug 6 12:24:05 2024] [dec0508] - Abort(99) (rank 0 in comm 496): application called MPI_Abort(comm=0x84000002, 99) - process 0
ERROR FROM:
source : bnrh_distribution_mod.f90
routine: bnrh_cdf_initialized
message: Ensemble member greater than upper bound first check(see code) 1.00000000000000 1.00000000000000
MPICH ERROR [Rank 1] [job id e35a8d7d-258f-45c5-8d80-ba05433b0be5] [Tue Aug 6 12:24:05 2024] [dec0508] - Abort(99) (rank 1 in comm 496): application called MPI_Abort(comm=0x84000001, 99) - process 1
ERROR FROM:
source : bnrh_distribution_mod.f90
routine: bnrh_cdf_initialized
message: Ensemble member greater than upper bound first check(see code) 1.00000000000000 1.00000000000000
MPICH ERROR [Rank 2] [job id e35a8d7d-258f-45c5-8d80-ba05433b0be5] [Tue Aug 6 12:24:05 2024] [dec0508] - Abort(99) (rank 2 in comm 496): application called MPI_Abort(comm=0x84000001, 99) - process 2
Here is the code: https://github.com/NCAR/DART/blob/75cf8dc9c566221f624ffd4d5eeba9fde5a1757c/assimilation_code/modules/assimilation/bnrh_distribution_mod.f90#L292-L300
Which model(s) are you working with?
lorenz_96_tracer advaction.
/glade/derecho/scratch/hkershaw/DART/Bugs/bgunn_qceff/DART/models/lorenz_96_tracer_advection/work
Version of DART
v11.5.1
Have you modified the DART code?
No
Build information
Please describe:
- Derecho
- ifort (IFORT) 2021.10.0 20230609