E3SM
E3SM copied to clipboard
Add climate reproducibility test for MPAS Ocean
Add a multi-variate Kolmogorov–Smirnov test for the MPAS Ocean component.
Based on experiments conducted for Mahajan (2021) doi:10.1145/3468267.3470572.
Because the test is specific to the MPAS-Ocean component, it's placed in the components/mpas-ocean/cime_config/SystemTests
directory. This is initialized with a restart file from a 400+ year run with the GMPAS-NYF
compset.
Example passing test: https://web.lcrc.anl.gov/public/e3sm/ac.mkelleher/evv/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.C.20240131_1022/ Example failing test: https://web.lcrc.anl.gov/public/e3sm/ac.mkelleher/evv/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.C.20240131_0945/
@mkstratos how do you run this test?
The initial condition file is currently only on LCRC, so from either Chrysalis or Anvil it can be run on this fork/branch with:
${E3SM_ROOT}/cime/scripts/create_test MVKO_PS.T62_oQU240.GMPAS-NYF [-g / -c] --pesfile ${E3SM_ROOT}/cime_config/testmods_dirs/config_pes_tests.xml
Next question, if you create a suite in tests.py called e3sm_ocn_nbfb and add this test, can "create_test e3sm_ocn_nbfb" execute the test?
Yes, that would work
Are you sure? Please try it and let me know and I'll approve this PR.
I've confirmed this works, should we add that that suite to this PR?
Yes please do. Also please add a new "e3sm_nbfb" suite that combines that one and the atm one.
What does the full test name that you tried look like?
I am not familiar with this testing method, so removed myself as a reviewer.
@mkstratos, could you let me know what would be some helpful testing for me to do on this PR?
Also, it seems the test uses the oQU240 mesh, which is too coarse to be scientifically valid. Could you let us know if that is an issue or if you are confident that the climate reproducibility (or lack there of) at such a coarse resolution would translate to a scientifically valid resolution as well?
Hi @xylar, this test is based on @salilmahajan's analysis, that oQU240 isn’t scientifically validated, it ”…includes all of the model structure and source code including the dynamical core and physical parameterizations…“, so comparing the statistics of two different code versions at this resolution should apply to higher resolution versions as well.
Any inspection / testing you’d like to do would be helpful but particularly in the analysis member setup portion (mvko.py:180
). This test is set up to use the MPAS timeseriesstatsclimatology
analysis member to compute the annual averages, on which the statistical testing is performed. Thanks in advance!
Thanks @mkstratos. I keep meaning to get to this but not quick managing. It's high on my to-do list.
@mkstratos, I finally found some time to test this. Unfortunately, it immediately failed for me (probably user error) on Chrysalis with:
$ cd cime/scripts
$ ./create_test MVKO_PS.T62_oQU240.GMPAS-NYF -g /lcrc/group/e3sm/ac.xylar/e3sm_baselines/test_20240404 --pesfile ../../cime_config/testmods_dirs/config_pes_tests.xml --wait
Testnames: ['MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel']
Using project from config_machines.xml: e3sm
create_test will do up to 1 tasks simultaneously
create_test will use up to 160 cores simultaneously
Creating test directory /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.G.20240404_102105_tts9dt
RUNNING TESTS:
MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel
Starting CREATE_NEWCASE for test MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel with 1 procs
Finished CREATE_NEWCASE for test MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel in 0.175126 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.G.20240404_102105_tts9dt
Errors were:
ERROR: Makes no sense to have empty read-only file: /gpfs/fs1/home/ac.xylar/e3sm_work/E3SM/mkstratos/tests/add-mpaso-mvk-test/driver-nuopc/cime_config/config_component.xml
Waiting for tests to finish
FAIL MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel (phase CREATE_NEWCASE)
Case dir: /lcrc/group/e3sm/ac.xasay-davis/scratch/chrys/MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel.G.20240404_102105_tts9dt
test-scheduler took 1.212627649307251 seconds
I tried merging master
in case that made a difference but it didn't. Happy to continue testing if you can give me some hints about what I'm doing wrong.
Maybe some updates are needed since the MOAB driver was added?
It should use the MCT driver and nothing in the MOAB one will matter. This string in the error message "driver-nuopc" is suspicious. The test thinks its using the nuopc driver which we don't have. I wonder if the "V" in "MVKO" is confusing create_test because "V" is a testtype mod CIME uses to change the driver. But you would have to name the test "MVKO_Vnuopc_PS.T62_oQU240.GMPAS-NYF" to get that.
@rljacob, this isn't the first time I've seen this kind of error but I don't remember what I did before to get it. Maybe leaving off the machine and compiler?
Update: no, that makes no difference.
Do you have CIME_MODEL env variable set to something besides "e3sm" ?
No and I explicitly set it to e3sm, which also made no difference.
Also, I can run other tests just fine (in other branches).
I tried logging in fresh and there was no difference.
@xylar did you update submodules after switching to the branch? I got past create_newcase. Now have a python error which I think is because I'm not in the cime_env environment.
Edit: yes just had to source /lcrc/soft/climate/e3sm-unified/load_latest_cime_env.csh. Test now building.
Test built and submitted but asked for 120 nodes even with PS? Is that right?
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
497642 compute test.MVK jacob PD 0:00 120 (Resources)
@rljacob, oh, I didn't realize I needed to source /lcrc/soft/climate/e3sm-unified/load_latest_cime_env.sh
to run this. I guess that makes sense.
@rljacob
did you update submodules after switching to the branch?
Yes, I did that. I just did a hard reset back to this branch as it is, and updated the submodules again. I ran:
source /lcrc/soft/climate/e3sm-unified/load_latest_cime_env.sh
./create_test MVKO_PS.T62_oQU240.GMPAS-NYF.chrysalis_intel -g /lcrc/group/e3sm/ac.xylar/e3sm_baselines/test_20240404 --wait
Same results as above...
I'm going to try to start fresh. Maybe something got messed up along the way with my worktree.
I know what it is! This happened to me before! You can't work from a directory that has tests
in the name, it confuses CIME.
@mkstratos, please use something other that tests
in your branch names in the future, as tempting as it is...
Also, all of you must not be using worktrees or you all would have had the same issue...
^^ @jgfouca, is this something you've ever run into? Worth reporting as an issue on CIME?
Yep, it's working fine now that I took tests
out of my directory structure.
@xylar , I made a CIME issue: https://github.com/ESMCI/cime/issues/4611
Thanks for doing that @jgfouca!