E3SM
E3SM copied to clipboard
Updates to the Redi isopycnal mixing scheme
this PR adds numerous changes to Redi isopycnal mixing Many changes have been made to improve the readability, physics, and capabilities of the redi iopycnal mixing scheme 1. Redi slopes are now removed from the gm file and computed on its own to reduce the size of the gm file 2. surface limiting of Redi now ensures surface horizontal mixing (in the upper portion of the boundary layer) and is full Redi at the BLD base 3. Redi now has a N2 dependent option that is separate from GM, this required making Redi a 3d variable and allows for future changes to the distribution of Redi horizontally and vertically 4. there is a fix in the isopycnal slope calculation for steeply sloping coordinate surfaces (e.g. ice shelf cavities)
[NML] Options are included to ensure this PR is BFB with current master
@jonbob here is the PR we discussed over slack. Hopefully it is basically good to go. Thanks for taking it on
further discussion and testing is in mirror Ocean Discussion PR https://github.com/E3SM-Ocean-Discussion/E3SM/pull/117
@vanroekel, could you fix the PR title? It's still based on the branch name.
Thanks for catching that @xylar - Done
testing shows non-BFB baseline results for: SMS_D_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-wcprod
@irenavankova All this testing is fantastic, thanks! Can you clarify the difference between experiments A and C?
@cbegeman, sorry there was a typo in the table, which I corrected now. Not all changes were introduced as config or namelist options, so the reference case "A" was run with a code base prior to any changes (https://github.com/irenavankova/E3SM/tree/lvk_redi_ref) and all the rest of the cases with the newer code base (https://github.com/irenavankova/E3SM/tree/lvk_redi_orig) and the changes were tested progressively. The namelist option N2_dependent in "A" is actually equivalent to "N2_original" in "B", so what "B" tests is a non optional change/update in Redi so that mixing is horizontal and not isopycnal in the mixed layer, but nothing in the N2 formulation changes between "A" and "B". "C" then tests the switch to new formulation of "N2_dependent". Because there was almost no change between N2_original and the new N2_dependent ("B" and "C"), the namelist option to keep N2_original was removed so it is no longer present in this PR.
With the two fixes above I was able to compile and pass the nightly test suite on perlmutter with gnu (debug and optimized). I was able to compile with intel (debug and optimized) on chrysalis and pass the nightly test suite, except that there is a library missing for ocean_global_ocean_Icos240_mesh so those did not run (unrelated to this PR). It did pass ocean_global_ocean_IcoswISC240_WOA23_performance_test.
Based on the in-depth commentary by @irenavankova and simulations presented above and at https://github.com/E3SM-Ocean-Discussion/E3SM/pull/117, which I reviewed previously, I think this is ready to merge once those two bugs are fixed.
Thanks @irenavankova for the in-depth testing!
I believe I've addressed the comments, but would appreciate another look.
@jonbob this will require a rerun of the namelist registry scripts.
I built an E3SM g-case with intel and debug and it built.
testing shows non-BFB baseline results for: SMS_D_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-wcprod
I am able to reproduce the non-BFB match in standalone using gnu optimized on perlmutter. All compass cases for ocean/global_ocean/Icos240/WOA23/ run successfully but fail comparison between this PR head f662035aa7 and the master branchpoint 8270a9ae21. Those all have redi on and these settings
config_use_Redi = .true.
config_Redi_closure = 'constant'
config_redi_transition_layer_bld_fraction = 0.8
config_Redi_use_new_sfc_taper = .false.
config_Redi_use_cavity_fix = .false.
config_Redi_constant_kappa = 600.0
config_redi_spatially_variable_min_kappa = 100.0
config_redi_spatially_variable_max_kappa = 1000.0
config_Redi_maximum_slope = 0.3
config_Redi_use_slope_taper = .true.
config_Redi_use_surface_taper = .true.
config_Redi_limit_term1 = .true.
config_Redi_use_quasi_monotone_limiter = .true.
config_Redi_quasi_monotone_safety_factor = 0.9
config_Redi_min_layers_diag_terms = 0
config_Redi_horizontal_taper = 'ramp'
config_Redi_horizontal_ramp_min = 20e3
config_Redi_horizontal_ramp_max = 30e3
The maximum difference at the first timestep is 1e-13 or 1e-14 for all prognostic fields, so this looks like an order of operations change to me.
If I change to
config_Redi_use_slope_taper = .false.
config_Redi_use_surface_taper = .false.
the non-BFB difference still occurs.
Using the test ocean/global_ocean/Icos240/WOA23/performance_test/forward I these small non-BFB differences appear with redi on but all options turned off:
&Redi_isopycnal_mixing
config_use_Redi = .true.
config_Redi_closure = 'constant'
config_Redi_constant_kappa = 600.0
config_Redi_maximum_slope = 0.3
config_Redi_use_slope_taper = .false.
config_Redi_use_surface_taper = .false.
config_Redi_limit_term1 = .false.
config_Redi_use_quasi_monotone_limiter = .false.
config_Redi_quasi_monotone_safety_factor = 0.9
config_Redi_min_layers_diag_terms = 0
config_Redi_horizontal_taper = 'ramp'
config_Redi_horizontal_ramp_min = 20e3
config_Redi_horizontal_ramp_max = 30e3
I verified that with redi turned off completely (config_use_Redi = .false.) that there indeed is a BFB match with master.
The non-BFB behavior disappears when I change from gnu optimized to gnu debug! If this is the same behavior in the coupled test, I think the non-BFB failure can be approved for the PR.
Specifically, I retested ocean/global_ocean/Icos240/WOA23/performance_test/forward with the original namelist above as well as the test with all the internal redi flags off on the previous comment. In both cases, there are 1e-13 max mismatches with gnu optimized between this branch and the master branchpoint. Comparisons are BFB with gnu debug for both.
I tested with gnu optimized and config_Redi_horizontal_taper = 'none' and this PR and master still have non-BFB match, so these standalone results may not be catching the same difference as the E3SM test.
Test merge passes:
- ERS.ne4pg2_oQU480.WCYCL1850NS.chrysalis_intel
- SMS_D_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-wcprod
- ERP_Ld3.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-pioroot1
- ERS_Ld5.T62_oQU120.CMPASO-NYF.chrysalis_intel
- ERS_Ld5.TL319_oQU240wLI_ais8to30.GMPAS-JRA1p5-DIB-PISMF-DIS.chrysalis_intel.mpaso-ocn_glcshelf
- ERS_Ld5.TL319_oQU240wLI_ais8to30.GMPAS-JRA1p5-DIB-PISMF-SIS.chrysalis_intel.mpaso-ocn_glcshelf
- ERS_Ld5.T62_oQU240wLI.GMPAS-DIB-IAF-DISMF.chrysalis_intel.C.20250327_114302_nmarmq
- ERS_Ld5.T62_oQU240wLI.GMPAS-DIB-IAF-PISMF.chrysalis_intel
- PEM_Ln5.T62_oQU240wLI.GMPAS-DIB-IAF-DISMF.chrysalis_intel
- PEM_Ln5.T62_oQU240wLI.GMPAS-DIB-IAF-PISMF.chrysalis_intel
- PET_Ln5.T62_oQU240wLI.GMPAS-DIB-IAF-DISMF.chrysalis_intel
- PET_Ln5.T62_oQU240wLI.GMPAS-DIB-IAF-PISMF.chrysalis_intel
with expected NML DIFFs
There was one baseline DIFF:
- SMS_D_Ld1.TL319_IcoswISC30E3r5.GMPAS-JRA1p5-DIB-PISMF.chrysalis_intel.mpaso-jra_1958
With the latest commit, the failing test now passes. Because it was likely a threading issue, I ran it multiple times to be sure it was fixed -- it has passed five of five runs.
Update: more testing needed.
Converting to draft until testing is finished.