E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

Add climate reproducibility test for MPAS Ocean

Open mkstratos opened this issue 1 year ago • 9 comments

Add a multi-variate Kolmogorov–Smirnov test for the MPAS Ocean component.

Based on experiments conducted for Mahajan (2021) doi:10.1145/3468267.3470572. Because the test is specific to the MPAS-Ocean component, it's placed in the components/mpas-ocean/cime_config/SystemTests directory. This is initialized with a restart file from a 400+ year run with the GMPAS-NYF compset.

mkstratos avatar Feb 05 '24 18:02 mkstratos

Example passing test: https://web.lcrc.anl.gov/public/e3sm/ac.mkelleher/evv/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.C.20240131_1022/ Example failing test: https://web.lcrc.anl.gov/public/e3sm/ac.mkelleher/evv/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.C.20240131_0945/

mkstratos avatar Feb 05 '24 18:02 mkstratos

@mkstratos how do you run this test?

rljacob avatar Feb 27 '24 19:02 rljacob

The initial condition file is currently only on LCRC, so from either Chrysalis or Anvil it can be run on this fork/branch with: ${E3SM_ROOT}/cime/scripts/create_test MVKO_PS.T62_oQU240.GMPAS-NYF [-g / -c] --pesfile ${E3SM_ROOT}/cime_config/testmods_dirs/config_pes_tests.xml

mkstratos avatar Feb 27 '24 19:02 mkstratos

Next question, if you create a suite in tests.py called e3sm_ocn_nbfb and add this test, can "create_test e3sm_ocn_nbfb" execute the test?

rljacob avatar Feb 27 '24 19:02 rljacob

Yes, that would work

mkstratos avatar Feb 27 '24 19:02 mkstratos

Are you sure? Please try it and let me know and I'll approve this PR.

rljacob avatar Feb 27 '24 20:02 rljacob

I've confirmed this works, should we add that that suite to this PR?

mkstratos avatar Feb 28 '24 14:02 mkstratos

Yes please do. Also please add a new "e3sm_nbfb" suite that combines that one and the atm one.

rljacob avatar Feb 28 '24 17:02 rljacob

What does the full test name that you tried look like?

rljacob avatar Feb 28 '24 17:02 rljacob

I am not familiar with this testing method, so removed myself as a reviewer.

mark-petersen avatar Mar 15 '24 23:03 mark-petersen

@mkstratos, could you let me know what would be some helpful testing for me to do on this PR?

Also, it seems the test uses the oQU240 mesh, which is too coarse to be scientifically valid. Could you let us know if that is an issue or if you are confident that the climate reproducibility (or lack there of) at such a coarse resolution would translate to a scientifically valid resolution as well?

xylar avatar Mar 17 '24 19:03 xylar

Hi @xylar, this test is based on @salilmahajan's analysis, that oQU240 isn’t scientifically validated, it ”…includes all of the model structure and source code including the dynamical core and physical parameterizations…“, so comparing the statistics of two different code versions at this resolution should apply to higher resolution versions as well.

Any inspection / testing you’d like to do would be helpful but particularly in the analysis member setup portion (mvko.py:180). This test is set up to use the MPAS timeseriesstatsclimatology analysis member to compute the annual averages, on which the statistical testing is performed. Thanks in advance!

mkstratos avatar Mar 25 '24 18:03 mkstratos

Thanks @mkstratos. I keep meaning to get to this but not quick managing. It's high on my to-do list.

xylar avatar Mar 29 '24 14:03 xylar

@mkstratos, I finally found some time to test this. Unfortunately, it immediately failed for me (probably user error) on Chrysalis with:

$ cd cime/scripts
$ ./create_test MVKO_PS.T62_oQU240.GMPAS-NYF -g /lcrc/group/e3sm/ac.xylar/e3sm_baselines/test_20240404 --pesfile ../../cime_config/testmods_dirs/config_pes_tests.xml --wait

Testnames: ['MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel']
Using project from config_machines.xml: e3sm
create_test will do up to 1 tasks simultaneously
create_test will use up to 160 cores simultaneously
Creating test directory /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.G.20240404_102105_tts9dt
RUNNING TESTS:
  MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel
Starting CREATE_NEWCASE for test MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel with 1 procs
Finished CREATE_NEWCASE for test MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel in 0.175126 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.G.20240404_102105_tts9dt
    Errors were:
        ERROR: Makes no sense to have empty read-only file: /gpfs/fs1/home/ac.xylar/e3sm_work/E3SM/mkstratos/tests/add-mpaso-mvk-test/driver-nuopc/cime_config/config_component.xml

Waiting for tests to finish
FAIL MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel (phase CREATE_NEWCASE)
    Case dir: /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.G.20240404_102105_tts9dt
test-scheduler took 1.212627649307251 seconds

I tried merging master in case that made a difference but it didn't. Happy to continue testing if you can give me some hints about what I'm doing wrong.

xylar avatar Apr 04 '24 15:04 xylar

Maybe some updates are needed since the MOAB driver was added?

xylar avatar Apr 04 '24 15:04 xylar

It should use the MCT driver and nothing in the MOAB one will matter. This string in the error message "driver-nuopc" is suspicious. The test thinks its using the nuopc driver which we don't have. I wonder if the "V" in "MVKO" is confusing create_test because "V" is a testtype mod CIME uses to change the driver. But you would have to name the test "MVKO_Vnuopc_PS.T62_oQU240.GMPAS-NYF" to get that.

rljacob avatar Apr 04 '24 15:04 rljacob

@rljacob, this isn't the first time I've seen this kind of error but I don't remember what I did before to get it. Maybe leaving off the machine and compiler?

Update: no, that makes no difference.

xylar avatar Apr 04 '24 16:04 xylar

Do you have CIME_MODEL env variable set to something besides "e3sm" ?

rljacob avatar Apr 04 '24 17:04 rljacob

No and I explicitly set it to e3sm, which also made no difference.

Also, I can run other tests just fine (in other branches).

xylar avatar Apr 04 '24 18:04 xylar

I tried logging in fresh and there was no difference.

xylar avatar Apr 04 '24 18:04 xylar

@xylar did you update submodules after switching to the branch? I got past create_newcase. Now have a python error which I think is because I'm not in the cime_env environment.

Edit: yes just had to source /lcrc/soft/climate/e3sm-unified/load_latest_cime_env.csh. Test now building.

rljacob avatar Apr 04 '24 19:04 rljacob

Test built and submitted but asked for 120 nodes even with PS? Is that right?

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            497642   compute test.MVK    jacob PD       0:00    120 (Resources)

rljacob avatar Apr 04 '24 19:04 rljacob

@rljacob, oh, I didn't realize I needed to source /lcrc/soft/climate/e3sm-unified/load_latest_cime_env.sh to run this. I guess that makes sense.

xylar avatar Apr 04 '24 19:04 xylar

@rljacob

did you update submodules after switching to the branch?

Yes, I did that. I just did a hard reset back to this branch as it is, and updated the submodules again. I ran:

source /lcrc/soft/climate/e3sm-unified/load_latest_cime_env.sh
./create_test MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel -g /lcrc/group/e3sm/ac.xylar/e3sm_baselines/test_20240404 --wait

Same results as above...

xylar avatar Apr 04 '24 19:04 xylar

I'm going to try to start fresh. Maybe something got messed up along the way with my worktree.

xylar avatar Apr 04 '24 19:04 xylar

I know what it is! This happened to me before! You can't work from a directory that has tests in the name, it confuses CIME.

@mkstratos, please use something other that tests in your branch names in the future, as tempting as it is...

Also, all of you must not be using worktrees or you all would have had the same issue...

xylar avatar Apr 04 '24 19:04 xylar

^^ @jgfouca, is this something you've ever run into? Worth reporting as an issue on CIME?

xylar avatar Apr 04 '24 19:04 xylar

Yep, it's working fine now that I took tests out of my directory structure.

xylar avatar Apr 04 '24 19:04 xylar

@xylar , I made a CIME issue: https://github.com/ESMCI/cime/issues/4611

jgfouca avatar Apr 04 '24 19:04 jgfouca

Thanks for doing that @jgfouca!

xylar avatar Apr 04 '24 20:04 xylar