E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

Negative runoff quick-fix

Open hydrotian opened this issue 2 months ago • 20 comments

A quick-fix to eliminate the negative runoff sent from ROF to OCN. Activated by setting redirect_negative_qgwl = .true. in user_nl_mosart. Two scenarios considered: Scenario A (net_global_qgwl ≥ 0):

  • Proportionally scales down positive qgwl cells
  • Zeros out negative qgwl cells
  • No outlet redistribution

Scenario B (net_global_qgwl < 0):

  • Zeros out all qgwl
  • Redistributes deficit to all outlets proportionally

hydrotian avatar Oct 20 '25 10:10 hydrotian

@hydrotian Please can you provide a location of the coupled simulation with these changes for us to explore? Also can you provide diagnostics for this simulation? Finally, can you confirm that these changes pass SMS, PET, PEM and ERS tests in a B-case?

proteanplanet avatar Oct 20 '25 18:10 proteanplanet

@proteanplanet I don't have a coupled simulations done with this PR yet but I plan to submit one following my previous Bluetip simulation. This PR passed the e3sm_land_developer test suite which includes 50+ tests on Compy with some Namelist changes and Throughput changes. See the attached test results. test_results.txt

hydrotian avatar Oct 20 '25 18:10 hydrotian

Those test results don't have any PET or PEM tests. Try PET.ne4pg2_ne4pg2.I1850CNPRDCTCBCTOP and PEM.ne4pg2_ne4pg2.I1850CNPRDCTCBCTOP

rljacob avatar Oct 20 '25 21:10 rljacob

@rljacob The PET.ne4pg2_ne4pg2.I1850CNPRDCTCBCTOP simulation failed on Compy with following error message:

 Opened existing file 
 /compyfs/inputdata/share/domains/domain.lnd.ne4pg2_oQU240.190321.nc          23
 lat/lon grid flag (isgrid2d) is  F
 ncd_inqvid: variable LANDMASK is not on dataset
 decompInit_lnd(): Number of clumps exceeds number of land grid cells
         320         211
 ENDRUN:
 ERROR in decompInitMod.F90 at line 183

It is strange as I did not modify the land model in this PR. Any ideas? Should I try it on Chrysalis instead?

hydrotian avatar Oct 20 '25 23:10 hydrotian

Yes try chrysalis. There may not be a good pelayout for that case on compy.

rljacob avatar Oct 20 '25 23:10 rljacob

I ran a PEM.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel test and it failed the comparison between the two runs. The PEM_Ln9.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel test that's in e3sm_integration passes, but since it's only running 9 steps mosart only runs once in that test

jonbob avatar Oct 22 '25 18:10 jonbob

I ran a PEM.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel test and it failed the comparison between the two runs. The PEM_Ln9.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel test that's in e3sm_integration passes, but since it's only running 9 steps mosart only runs once in that test

My PET.ne4pg2_ne4pg2.I1850CNPRDCTCBCTOP passed, but the PEM.ne4pg2_ne4pg2.I1850CNPRDCTCBCTOP failed on comparison as well, because the 2nd run couldn't complete. I increased the walltime to 2 hours (maximum for a debug queue on Chrysalis?) but the simulation appeared to stall at some point. Then I tested the baseline (https://github.com/E3SM-Project/E3SM/commit/64046ec75587d9fcd035f22553192665dd540f56) and failed at the same point.

hydrotian avatar Oct 22 '25 18:10 hydrotian

Thanks @hydrotian -- I checked and both runs for my PEM test completed fine, just had different results. I'm running a similar PET test right now

jonbob avatar Oct 22 '25 18:10 jonbob

@jonbob Thanks. Could you share the cprnc.out report? I want to see which fields are different between the two runs.

hydrotian avatar Oct 22 '25 18:10 hydrotian

Sure, but after five days it ends up with 351 out of 507 fields different. It's at:

/lcrc/group/acme/ac.jwolfe/scratch/chrys/PEM.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.20251022_120245_ruutak/PEM.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.20251022_120245_ruutak.cpl.hi.0001-01-06-00000.nc.base.cprnc.out

jonbob avatar Oct 22 '25 18:10 jonbob

OK, the similar PET test (PET.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel) passed

jonbob avatar Oct 22 '25 19:10 jonbob

Thanks, @jonbob. Any insights about the PEM test fail? Would you mind doing a same PEM test for the baseline master where I branched from (https://github.com/E3SM-Project/E3SM/commit/64046ec75587d9fcd035f22553192665dd540f56)?

hydrotian avatar Oct 22 '25 19:10 hydrotian

No insights from the PEM test -- we would have to do one where we tried to catch the first field that gets different answers. @proteanplanet noticed that you have a routine for sort_outlets_by_discharge_desc but we couldn't see it getting called?

jonbob avatar Oct 22 '25 19:10 jonbob

Yes. That was from an earlier commit on this branch. I can clean it up.

hydrotian avatar Oct 22 '25 20:10 hydrotian

To get a better idea of when it diffs, change the river coupling frequency to match the other models. That might allow you to go back to a 9 nstep test. Also change the coupler history output to be every timestep.

rljacob avatar Oct 22 '25 21:10 rljacob

@hydrotian -- I set redirect_negative_qgwl = .false. in your branch and the PEM test passes

jonbob avatar Oct 22 '25 21:10 jonbob

The PEM test has passed now. Both @jonbob and I confirmed that on our separate tests.

hydrotian avatar Nov 03 '25 21:11 hydrotian

Status: waiting for climate tests to see impact.

rljacob avatar Nov 20 '25 18:11 rljacob

A 10-year fully-coupled simulation with this quick-fix based on v3.LR.piControl has completed. It is confirmed that there's zero negative runoff passed to the ocean from the land. image

The monthly river discharge comparison at the river outlet (To Ocean) and the last gridcell before the outlet (Over Land) for the 'quick-fix' run shows that the water amount been redirected due to this quick-fix is negligible to the river discharge: compare_outlet_upstream_diff_Amazon

The river discharge comparison between this simulation and the baseline at major river outlets shows that the negative runoff redirection did create some impact to the regional hydrology. river_dashboard_optimized

hydrotian avatar Dec 04 '25 22:12 hydrotian

I also tested this PR with the new flag on and it passed:

  • ERP_Ld3.ne30pg2_r05_IcoswISC30E3r5.WCYCL1850.chrysalis_intel.allactive-pioroot1

jonbob avatar Dec 10 '25 17:12 jonbob