E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

Updates to the Redi isopycnal mixing scheme

Open vanroekel opened this issue 8 months ago • 19 comments

this PR adds numerous changes to Redi isopycnal mixing Many changes have been made to improve the readability, physics, and capabilities of the redi iopycnal mixing scheme 1. Redi slopes are now removed from the gm file and computed on its own to reduce the size of the gm file 2. surface limiting of Redi now ensures surface horizontal mixing (in the upper portion of the boundary layer) and is full Redi at the BLD base 3. Redi now has a N2 dependent option that is separate from GM, this required making Redi a 3d variable and allows for future changes to the distribution of Redi horizontally and vertically 4. there is a fix in the isopycnal slope calculation for steeply sloping coordinate surfaces (e.g. ice shelf cavities)

[NML] Options are included to ensure this PR is BFB with current master

vanroekel avatar Mar 12 '25 17:03 vanroekel

@jonbob here is the PR we discussed over slack. Hopefully it is basically good to go. Thanks for taking it on

vanroekel avatar Mar 12 '25 17:03 vanroekel

further discussion and testing is in mirror Ocean Discussion PR https://github.com/E3SM-Ocean-Discussion/E3SM/pull/117

vanroekel avatar Mar 12 '25 17:03 vanroekel

@vanroekel, could you fix the PR title? It's still based on the branch name.

xylar avatar Mar 12 '25 18:03 xylar

Thanks for catching that @xylar - Done

vanroekel avatar Mar 12 '25 18:03 vanroekel

testing shows non-BFB baseline results for: SMS_D_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-wcprod

jonbob avatar Mar 17 '25 20:03 jonbob

@irenavankova All this testing is fantastic, thanks! Can you clarify the difference between experiments A and C?

cbegeman avatar Mar 18 '25 19:03 cbegeman

@cbegeman, sorry there was a typo in the table, which I corrected now. Not all changes were introduced as config or namelist options, so the reference case "A" was run with a code base prior to any changes (https://github.com/irenavankova/E3SM/tree/lvk_redi_ref) and all the rest of the cases with the newer code base (https://github.com/irenavankova/E3SM/tree/lvk_redi_orig) and the changes were tested progressively. The namelist option N2_dependent in "A" is actually equivalent to "N2_original" in "B", so what "B" tests is a non optional change/update in Redi so that mixing is horizontal and not isopycnal in the mixed layer, but nothing in the N2 formulation changes between "A" and "B". "C" then tests the switch to new formulation of "N2_dependent". Because there was almost no change between N2_original and the new N2_dependent ("B" and "C"), the namelist option to keep N2_original was removed so it is no longer present in this PR.

irenavankova avatar Mar 19 '25 04:03 irenavankova

With the two fixes above I was able to compile and pass the nightly test suite on perlmutter with gnu (debug and optimized). I was able to compile with intel (debug and optimized) on chrysalis and pass the nightly test suite, except that there is a library missing for ocean_global_ocean_Icos240_mesh so those did not run (unrelated to this PR). It did pass ocean_global_ocean_IcoswISC240_WOA23_performance_test.

Based on the in-depth commentary by @irenavankova and simulations presented above and at https://github.com/E3SM-Ocean-Discussion/E3SM/pull/117, which I reviewed previously, I think this is ready to merge once those two bugs are fixed.

mark-petersen avatar Mar 21 '25 16:03 mark-petersen

Thanks @irenavankova for the in-depth testing!

alicebarthel avatar Mar 22 '25 00:03 alicebarthel

I believe I've addressed the comments, but would appreciate another look.

@jonbob this will require a rerun of the namelist registry scripts.

vanroekel avatar Mar 23 '25 05:03 vanroekel

I built an E3SM g-case with intel and debug and it built.

vanroekel avatar Mar 23 '25 05:03 vanroekel

testing shows non-BFB baseline results for: SMS_D_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-wcprod

I am able to reproduce the non-BFB match in standalone using gnu optimized on perlmutter. All compass cases for ocean/global_ocean/Icos240/WOA23/ run successfully but fail comparison between this PR head f662035aa7 and the master branchpoint 8270a9ae21. Those all have redi on and these settings

    config_use_Redi = .true.
    config_Redi_closure = 'constant'
    config_redi_transition_layer_bld_fraction = 0.8
    config_Redi_use_new_sfc_taper = .false.
    config_Redi_use_cavity_fix = .false.
    config_Redi_constant_kappa = 600.0
    config_redi_spatially_variable_min_kappa = 100.0
    config_redi_spatially_variable_max_kappa = 1000.0
    config_Redi_maximum_slope = 0.3
    config_Redi_use_slope_taper = .true.
    config_Redi_use_surface_taper = .true.
    config_Redi_limit_term1 = .true.
    config_Redi_use_quasi_monotone_limiter = .true.
    config_Redi_quasi_monotone_safety_factor = 0.9
    config_Redi_min_layers_diag_terms = 0
    config_Redi_horizontal_taper = 'ramp'
    config_Redi_horizontal_ramp_min = 20e3
    config_Redi_horizontal_ramp_max = 30e3

The maximum difference at the first timestep is 1e-13 or 1e-14 for all prognostic fields, so this looks like an order of operations change to me.

If I change to

    config_Redi_use_slope_taper = .false.
    config_Redi_use_surface_taper = .false.

the non-BFB difference still occurs.

mark-petersen avatar Mar 25 '25 16:03 mark-petersen

Using the test ocean/global_ocean/Icos240/WOA23/performance_test/forward I these small non-BFB differences appear with redi on but all options turned off:

&Redi_isopycnal_mixing
    config_use_Redi = .true.
    config_Redi_closure = 'constant'
    config_Redi_constant_kappa = 600.0
    config_Redi_maximum_slope = 0.3
    config_Redi_use_slope_taper = .false.
    config_Redi_use_surface_taper = .false.
    config_Redi_limit_term1 = .false.
    config_Redi_use_quasi_monotone_limiter = .false.
    config_Redi_quasi_monotone_safety_factor = 0.9
    config_Redi_min_layers_diag_terms = 0
    config_Redi_horizontal_taper = 'ramp'
    config_Redi_horizontal_ramp_min = 20e3
    config_Redi_horizontal_ramp_max = 30e3

I verified that with redi turned off completely (config_use_Redi = .false.) that there indeed is a BFB match with master.

mark-petersen avatar Mar 25 '25 16:03 mark-petersen

The non-BFB behavior disappears when I change from gnu optimized to gnu debug! If this is the same behavior in the coupled test, I think the non-BFB failure can be approved for the PR.

Specifically, I retested ocean/global_ocean/Icos240/WOA23/performance_test/forward with the original namelist above as well as the test with all the internal redi flags off on the previous comment. In both cases, there are 1e-13 max mismatches with gnu optimized between this branch and the master branchpoint. Comparisons are BFB with gnu debug for both.

mark-petersen avatar Mar 25 '25 16:03 mark-petersen

I tested with gnu optimized and config_Redi_horizontal_taper = 'none' and this PR and master still have non-BFB match, so these standalone results may not be catching the same difference as the E3SM test.

mark-petersen avatar Mar 25 '25 17:03 mark-petersen

Test merge passes:

  • ERS.ne4pg2_oQU480.WCYCL1850NS.chrysalis_intel
  • SMS_D_Ld1.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-wcprod
  • ERP_Ld3.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-pioroot1
  • ERS_Ld5.T62_oQU120.CMPASO-NYF.chrysalis_intel
  • ERS_Ld5.TL319_oQU240wLI_ais8to30.GMPAS-JRA1p5-DIB-PISMF-DIS.chrysalis_intel.mpaso-ocn_glcshelf
  • ERS_Ld5.TL319_oQU240wLI_ais8to30.GMPAS-JRA1p5-DIB-PISMF-SIS.chrysalis_intel.mpaso-ocn_glcshelf
  • ERS_Ld5.T62_oQU240wLI.GMPAS-DIB-IAF-DISMF.chrysalis_intel.C.20250327_114302_nmarmq
  • ERS_Ld5.T62_oQU240wLI.GMPAS-DIB-IAF-PISMF.chrysalis_intel
  • PEM_Ln5.T62_oQU240wLI.GMPAS-DIB-IAF-DISMF.chrysalis_intel
  • PEM_Ln5.T62_oQU240wLI.GMPAS-DIB-IAF-PISMF.chrysalis_intel
  • PET_Ln5.T62_oQU240wLI.GMPAS-DIB-IAF-DISMF.chrysalis_intel
  • PET_Ln5.T62_oQU240wLI.GMPAS-DIB-IAF-PISMF.chrysalis_intel

with expected NML DIFFs

There was one baseline DIFF:

  • SMS_D_Ld1.TL319_IcoswISC30E3r5.GMPAS-JRA1p5-DIB-PISMF.chrysalis_intel.mpaso-jra_1958

jonbob avatar Mar 27 '25 17:03 jonbob

With the latest commit, the failing test now passes. Because it was likely a threading issue, I ran it multiple times to be sure it was fixed -- it has passed five of five runs.

jonbob avatar Mar 27 '25 22:03 jonbob

Update: more testing needed.

rljacob avatar Apr 10 '25 17:04 rljacob

Converting to draft until testing is finished.

rljacob avatar May 05 '25 18:05 rljacob