CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

Add checkFlag=.true. to ESMF_FieldRegridStore calls throughout

Open slevis-lmwg opened this issue 1 year ago • 1 comments

Description of changes

Motivation explained in #1829

Specific notes

Contributors other than yourself, if any: @ekluzek

CTSM Issues Fixed (include github issue #): #1829 unless we decide that we need the same update in mksurfdata_map

Are answers expected to change (and if so in what way)? No

Any User Interface Changes (namelist or namelist defaults changes)? No

Testing performed, if any: Ran mksurfdata_esmf for --res=10x15 --start-date 1850 --end-date 1850 with and without checkFlag=.true. and got bit-for-bit same answers in the generated fsurdat file.

slevis-lmwg avatar Sep 07 '22 22:09 slevis-lmwg

Thanks for putting this together. I think we want to evaluate the checkFlag with some of Adam's datasets that we saw problems with. I think to do that we'll need to update the ctsm5.2 branch to the latest. The checkflag only applies to certain versions of ESMF, and we want to make sure we are using a late enough version.

ekluzek avatar Sep 08 '22 06:09 ekluzek

I think we want to evaluate the checkFlag with some of Adam's datasets that we saw problems with. I think to do that we'll need to update the ctsm5.2 branch to the latest. The checkflag only applies to certain versions of ESMF, and we want to make sure we are using a late enough version.

@adamrher @ekluzek my branch is updated to the latest ctsm5.2 which includes dev111.

slevis-lmwg avatar Nov 01 '22 17:11 slevis-lmwg

let me try to check out ur code and run it on a bad grid.

adamrher avatar Nov 01 '22 19:11 adamrher

Thanks @adamrher that's exactly what we'd like you to do. It'll be good to see if it helps in any way. If it does we should keep it, but if not we might not bring this in. Thanks for volunteering to do that.

ekluzek avatar Nov 01 '22 20:11 ekluzek

Some notes from building running the mksurdata_esmf.

  1. when gen_mksurfdata_build.sh fails, I had to manually remove tool_bld before running again. Is there a --clean option to cleanup the failed build?
  2. had to load "ncar_pylib" to run ./gen_mksurfdata_namelist.py; I know CISL is trying to deprecate this method of loading python libraries, but I don't know what the replacement is supposed to be.

Now I'm lost. I'm trying to run an unsupported grid. I'm used to running regridbatch.sh at this point to create mapping weight files given a SCRIP file. I just assumed this new workflow would require an ESMF mesh file instead. But I don't know what I'm supposed to do next or where to supply the grid file. There are no instructions in README for user supplied grids.

adamrher avatar Nov 01 '22 20:11 adamrher

  1. when gen_mksurfdata_build.sh fails, I had to manually remove tool_bld before running again. Is there a --clean option to cleanup the failed build?

Currently no. We rely on the error message instructing users to manually remove the /tool_bld directory.

  1. had to load "ncar_pylib" to run ./gen_mksurfdata_namelist.py; I know CISL is trying to deprecate this method of loading python libraries, but I don't know what the replacement is supposed to be.

My understanding: From your /ctsm directory run ./py_env_create. When it completes, follow the instructions on the screen, which I believe tell you to conda activate ctsm_py

Now I'm lost. I'm trying to run an unsupported grid. I'm used to running regridbatch.sh at this point to create mapping weight files given a SCRIP file. I just assumed this new workflow would require an ESMF mesh file instead. But I don't know what I'm supposed to do next or where to supply the grid file. There are no instructions in README for user supplied grids.

True, we haven't added explicit instructions for this in the README, but ./gen_mksurfdata_namelist.py --help tells you to include the following options for a user defined grid:

--model-mesh FORCE_MODEL_MESH_FILE
--model-mesh-nx FORCE_MODEL_MESH_NX
--model-mesh-ny FORCE_MODEL_MESH_NY

If you don't have the mesh file, you can make one as follows:

module load nco  # on cheyenne OR module load tool/nco/4.7.5   …on izumi
ncks --rgr infer --rgr scrip=scrip.nc <lat_lon_file_with_user_defined_grid>.nc metadata.nc  # to generate scrip.nc
/glade/u/apps/ch/opt/esmf-netcdf/8.0.0/intel/19.0.5/bin/bing/Linux.intel.64.mpiuni.default/ESMF_Scrip2Unstruct scrip.nc FORCE_MODEL_MESH_FILE 0  # to generate FORCE_MODEL_MESH_FILE

On izumi the last command becomes: /project/esmf/PROGS/esmf/8.2.0/mpiuni//gfortran/9.3.0/bin/binO/Linux.gfortran.64.mpiuni.default/ESMF_Scrip2Unstruct scrip.nc FORCE_MODEL_MESH_FILE 0

If you don't have a <lat_lon_file_with_user_defined_grid>.nc, I believe that the ncks command above allows you to enter lats and lons at the command line, but I don't have experience with that...

slevis-lmwg avatar Nov 01 '22 21:11 slevis-lmwg

So it worked ... maybe too well. It triggers the checkflag error right after I submit the job for the bad grid:

20221101 154324.803 ERROR            PET159 ~~~~~~~~~~~~~~~~~ Self Overlapping Grid or Mesh Detected ~~~~~~~~~~~~~~~~~
20221101 154324.803 ERROR            PET159
20221101 154324.803 ERROR            PET159   Significant self overlap (overlap area > 1e-15) detected in destination Mesh or Grid.
20221101 154324.803 ERROR            PET159   Maximum overlap area=5.6122e-06 occurs between elem id=153601 and elem id=153602
20221101 154324.803 ERROR            PET159
20221101 154324.803 ERROR            PET159 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I then tried our supported ARCTIC var-res grid, and it went on for a bit, but then also triggered the checkflag error (Maximum overlap area=2.46564e-08 occurs between elem id=125576276 and elem id=67945803).

I didn't expect this to occur. Can you instruct me how to disable checkflag? What occurred in old tools is that it would dump out for a bad grid, for the same reason but the error isn't as clear since it's not an official check. But the good grid I test would pass fine. I'm wondering if I can reproduce that behavior with the new tools here. Or if now both grids are deemed bad, at which case I need to double check some other stuff.

adamrher avatar Nov 01 '22 22:11 adamrher

I then tried our supported ARCTIC var-res grid, and it went on for a bit, but then also triggered the checkflag error (Maximum overlap area=2.46564e-08 occurs between elem id=125576276 and elem id=67945803).

I didn't expect this to occur. Can you instruct me how to disable checkflag? What occurred in old tools is that it would dump out for a bad grid, for the same reason but the error isn't as clear since it's not an official check. But the good grid I test would pass fine. I'm wondering if I can reproduce that behavior with the new tools here. Or if now both grids are deemed bad, at which case I need to double check some other stuff.

@ekluzek I will let you answer here, but a thought that comes to mind: Maybe we can specify a checkflag tolerance.

@adamrher to run without checkflag, you should checkout the escomp/ctsm5.2.mksurfdata branch rather than my checkflag branch.

slevis-lmwg avatar Nov 01 '22 22:11 slevis-lmwg

I was mistaken. The error for the run with the ARCTIC grid is a problem with the HYDRO1K grid file. The log file hangs at:

Attempting to make Topography statistics.....
 Input file is /glade/p/cesm/cseg/inputdata/lnd/clm2/rawdata/mksrf_topostats_1km-merge-10min_HYDRO1K-merge-nomask_simyr2000.c130402.nc
 Input mesh file is /glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/grids/UGRID_1km-merge-10min_HYDRO1K-merge-nomask_cdf5_c130402.nc
mktopostats creating a routehandle

And then the model dumps with a bunch of ESMF log files:

20221101 164843.656 ERROR            PET282 ~~~~~~~~~~~~~~~~~ Self Overlapping Grid or Mesh Detected ~~~~~~~~~~~~~~~~~
20221101 164843.656 ERROR            PET282
20221101 164843.656 ERROR            PET282   Significant self overlap (overlap area > 1e-15) detected in source Mesh or Grid.
20221101 164843.656 ERROR            PET282   Maximum overlap area=2.46564e-08 occurs between elem id=125576276 and elem id=67945803
20221101 164843.656 ERROR            PET282
20221101 164843.656 ERROR            PET282 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It's a problem with the source grid; hydro1k. Whereas the previous error with my bad grid, it correctly flagged the destination grid. As Sam suggests, we could lower the tolerance to just below 1E-8 to see if it gets past the hydro1k issues.

adamrher avatar Nov 01 '22 22:11 adamrher

From today's emails so that I don't lose track...

@adamrher wrote: The status is that it does what we want it to do -- it flags the bad grid, and the good grid passes. However, when I ran it on the good grid, the checkflag found a problem with the hydro1k mesh file. So I've not been able to run the toolwith the checkglfag, to completion. Did you guys test this with a supported grid, and get the same error? Or is it just me?

@ekluzek wrote: [...] with Adam finding problems we should test all the standard grids with it.

@ekluzek I'm happy to do this and let's determine details in tomorrow's stand-up.

slevis-lmwg avatar Nov 04 '22 00:11 slevis-lmwg

@ekluzek wrote: [...] with Adam finding problems we should test all the standard grids with it.

Relates to #1903

I will merge Bill's work once it gets in the ctsm5.2.mksurfdata branch and then run the mksurfdata_esmf "multi" script.

slevis-lmwg avatar Nov 15 '22 20:11 slevis-lmwg

Generating this fsurdat file surfdata_1.9x2.5_hist_78pfts_CMIP6_2000_c230125.nc fails with the new checkFlag=.true. added to mktopostatsMod.F90.

Earlier I stated in this post erroneously that generating the same fsurdat file was failing with the new checkFlag=.true. added to mkesmfMod.F90 and mksoiltexMod.F90, as well. My misunderstanding came from not having run ./manage_externals/checkout_externals in the branch where I was working.

slevis-lmwg avatar Jan 26 '23 02:01 slevis-lmwg

This didn't provide a clear win to mksurfdata_esmf. So I'm closing because we didn't put it on the branch, so it can't come in now anyway. We may get back to this, so I'll leave #1829 open for now.

ekluzek avatar Mar 09 '24 20:03 ekluzek