E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

prep_rof_mrg using non-associated and un-allocated pointer O2R_RX

Open peterdschwartz opened this issue 2 years ago • 6 comments

While working on another issue, my debug runs would fail in prep_rof_mod with the below traceback, which has happened with all debug land tests that I've tried so far on chrysalis with Intel.

**SMS_D_Ld1.hcru_hcru.I1850CRUELMCN.chrysalis_intel**
 77: forrtl: severe (408): fort: (7): Attempt to use pointer O2R_RX when it is not associated with a target
 77:
 77: Image              PC                Routine            Line        Source
 77: libpnetcdf.so.3.0  0000155551014202  for_emit_diagnost     Unknown  Unknown
 77: e3sm.exe           00000000005E105F  prep_rof_mod_mp_p         478  prep_rof_mod.F90
 77: e3sm.exe           0000000000472343  cime_comp_mod_mp_        4404  cime_comp_mod.F90
 77: e3sm.exe           0000000000452604  cime_comp_mod_mp_        2873  cime_comp_mod.F90
 77: e3sm.exe           000000000047CD04  MAIN__                    153  cime_driver.F90
 77: e3sm.exe           000000000041AE22  Unknown               Unknown  Unknown
 77: libc-2.28.so       000015554E665493  __libc_start_main     Unknown  Unknown
 77: e3sm.exe           000000000041AD2E  Unknown               Unknown  Unknown

The problem seems to be that O2R_RX is only allocated in prep_rof_init if both ROF and OCN are both present but prep_rof_mrg is called even if OCN is not present. And my guess is that the fix is to add an if-statement to cime_run_rof_setup_send to check for OCN.

**from prep_rof_init**
if (rof_present .and. ocn_present) then

       call seq_comm_getData(CPLID, &
            mpicom=mpicom_CPLID, iamroot=iamroot_CPLID)

       lsize_r = mct_aVect_lsize(x2r_rx)

       o2x_ox => component_get_c2x_cx(ocn(1))
       lsize_o = mct_aVect_lsize(o2x_ox)

       allocate(o2racc_ox(num_inst_ocn))
       do eoi = 1,num_inst_ocn
          call mct_aVect_initSharedFields(o2x_ox, x2r_rx, o2racc_ox(eoi), lsize=lsize_o)
          call mct_aVect_zero(o2racc_ox(eoi))
       end do
       o2racc_ox_cnt = 0

       **allocate(o2r_rx(num_inst_rof))**

peterdschwartz avatar Sep 16 '22 15:09 peterdschwartz

@fdongyu @jonbob Tagging you since I think this was introduced from that PR.
Looking further at this, it's likely due to passing o2r_rx(eri) as an argument despite it not being allocated at all.

call prep_rof_merge(l2r_rx(eri), a2r_rx(eri), o2r_rx(eri), fractions_rx(efi), x2r_rx, cime_model)

ocn_rof_two_way is properly turned off so o2r_rx(eri) is not used, so a fix could be either to pass the index eri as an argument and use o2r_rx as a global variable in this module, an optional argument, or to make it a separate function that's called only if ocn_rof_two_way is turned on. Or something similar.

peterdschwartz avatar Sep 16 '22 16:09 peterdschwartz

@peterdschwartz , thanks for reporting this. I am trying to reproduce the issue on Compy. Could you please let me know how your debug land test is configured? I have all the land tests passed but maybe they are in the debug mode?

fdongyu avatar Sep 19 '22 06:09 fdongyu

Yes, it's for debug runs on chrysalis using intel compiler. I have implemented a fix for this issue in my test branch by passing o2r_rx as an optional argument only if ocn_rof_two_way is on, which has been working well enough so far.

diff --git a/driver-mct/main/prep_rof_mod.F90 b/driver-mct/main/prep_rof_mod.F90 index c17ec86f86..369f9d6a07 100644 --- a/driver-mct/main/prep_rof_mod.F90 +++ b/driver-mct/main/prep_rof_mod.F90

@@ -469,13 +469,17 @@ contains
-       call prep_rof_merge(l2r_rx(eri), a2r_rx(eri), o2r_rx(eri), fractions_rx(efi), x2r_rx, cime_model)
+       if(ocn_rof_two_way) then 
+         call prep_rof_merge(l2r_rx(eri), a2r_rx(eri), fractions_rx(efi), x2r_rx, cime_model, o2x_r=o2r_rx(eri))
+       else
+         call prep_rof_merge(l2r_rx(eri), a2r_rx(eri), fractions_rx(efi), x2r_rx, cime_model)
+       end if
+

-  subroutine prep_rof_merge(l2x_r, a2x_r, o2x_r, fractions_r, x2r_r, cime_model)
+  subroutine prep_rof_merge(l2x_r, a2x_r, fractions_r, x2r_r, cime_model,o2x_r)
 
     !-----------------------------------------------------------------------
     ! Description
@@ -492,10 +496,10 @@ contains
     ! Arguments
     type(mct_aVect),intent(in)    :: l2x_r
     type(mct_aVect),intent(in)    :: a2x_r
-    type(mct_aVect),intent(in)    :: o2x_r
     type(mct_aVect),intent(in)    :: fractions_r
     type(mct_aVect),intent(inout) :: x2r_r
     character(len=*)        , intent(in)    :: cime_model
+    type(mct_aVect),intent(in),optional  :: o2x_r

peterdschwartz avatar Sep 19 '22 11:09 peterdschwartz

@jonbob @fdongyu If there's no strong preference for how to implement a fix for this, I can go ahead and submit a bugfix PR with these changes.

peterdschwartz avatar Sep 19 '22 11:09 peterdschwartz

@peterdschwartz - that looks like the correct solution. Thanks for taking it on, and please feel free to either add me as a reviewer or assign me to the PR

jonbob avatar Sep 19 '22 15:09 jonbob

@peterdschwartz Thanks. The fix looks good to me. Please let me know if there is anything else I can do.

fdongyu avatar Sep 19 '22 16:09 fdongyu

As Gautam already noted, this is same issue as https://github.com/E3SM-Project/E3SM/issues/5221 and you found/fixed it first. I can close that one or maybe you could add "fixes #" to your PR.

ndkeen avatar Oct 13 '22 17:10 ndkeen

Sure I'll just link both issues in the PR

peterdschwartz avatar Oct 13 '22 20:10 peterdschwartz