CTSM
CTSM copied to clipboard
fates parameter file auto-build for all tests
Description of changes
This enables the automatic building of the fates parameter file binary for all tests. This calls ncgen from the shell_commands script in the Fates/ testdef folder, to operate on the fates default file that is version controlled.
Specific notes
This implementation is incomplete. In order to get this to work for all tests, I had to place the newly built binaries in the a new folder in the fates source tree. The reason for this is because some of the tests are multi-phase (PEM, ERP, etc). Each of these phases needs access to either the same parameter file, or an exact copy of it. However, the shell_commands (as far as my test show) script is only called the first time, so both parts of the test need access to the same file. Unfortunately, the xml files in both parts of the tests, do not provide any file-paths that are common to phase (I looked pretty thoroughly but maybe missed something), located somewhere on the scratch partition. For instance, they both have different cases, which makes it tough for us to locate the parameter file on the second test, if it has a different case as the first. I also tried using CIME_OUTPUT_ROOT, the sharedlib build location.
There is an xml entry in env_test.xml that is TEST_ARGV. This holds the root folder for the current test environment, and the id of the specific test currently run. With these two bits of information we could place a binary file that is accessible to all phases of a test. However, this information does not seem to be available via xml query at the time we run the shell_commands script.
Another location that might be better than the source, at least for the time being would be to put all these files in the CIME_OUTPUT_ROOT, which is usually just the scratch folder where all cases and tests go. Each parameter file could have the test name and hash in it, to prevent redundancy. The downside is that the root scratch folder starts to fill up.
Contributors other than yourself, if any:
@ekluzek @glemieux @adrifoster
Are answers expected to change (and if so in what way)?
Any User Interface Changes (namelist or namelist defaults changes)?
Testing performed, if any:
Here is a list of the parameter test file binaries it generates, one fates test creates 5.9M of data:
~/ctsm/src/fates/parameter_files> ls binaries/
ERP_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERP_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.20240124_191528_ohyuz4-params.nc.text
ERP_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.20240124_193525_2d6yfg-params.nc
ERP_D_P128x2_Ld3.f19_g17.I2000Clm50FatesCru.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERP_Ld3.f09_g17.I2000Clm50FatesRs.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERP_P256x2_Ld30.f45_f45_mg37.I2000Clm51FatesRs.derecho_intel.clm-mimicsFatesCold.0124-200432de_int-params.nc
ERS_D_Ld15.5x5_amazon.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdSeedDisp.0124-200432de_gnu-params.nc
ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdTwoStreamNoCompFixedBioGeo.0124-200432de_gnu-params.nc
ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdTreeDamage.0124-200432de_int-params.nc
ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdTwoStream.0124-200432de_int-params.nc
ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLandUse.0124-200432de_int-params.nc
ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLUH2.0124-200432de_int-params.nc
ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdPRT2.0124-200432de_int-params.nc
ERS_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_gnu.clm-FatesCold.0124-200432de_gnu-params.nc
ERS_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERS_D_Ld5.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdHydro.0124-200432de_int-params.nc
ERS_D_Ld5.f10_f10_mg37.I2000Clm50Fates.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_gnu.clm-FatesCold.0124-200432de_gnu-params.nc
ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdFixedBiogeo.0124-200432de_int-params.nc
ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdNoComp.0124-200432de_int-params.nc
ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdNoCompFixedBioGeo.0124-200432de_int-params.nc
ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdSizeAgeMort.0124-200432de_int-params.nc
ERS_Ld5.f19_g17.I2000Clm45Fates.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLogging.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdNoFire.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdPPhys.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdST3.0124-200432de_int-params.nc
ERS_Ld9.f10_f10_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdCH4Off.0124-200432de_int-params.nc
ERS_Lm13.f10_f10_mg37.I2000Clm50Fates.derecho_gnu.clm-FatesCold.0124-200432de_gnu-params.nc
ERS_Lm13.f45_f45_mg37.I2000Clm50Fates.derecho_intel.clm-FatesColdNoComp.0124-200432de_int-params.nc
ERS_P128x1_Lm25.f10_f10_mg37.I2000Clm51Fates.derecho_intel.clm-FatesColdNoComp.0124-200432de_int-params.nc
PEM_D_Ld15.5x5_amazon.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdSeedDisp.0124-200432de_gnu-params.nc
SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Fates.derecho_gnu.clm-FatesPRISM--clm-NEON-FATES-YELL.0124-200432de_gnu-params.nc
SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_gnu.clm-FatesCold.0124-200432de_gnu-params.nc
SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
SMS_Lm3_D_Mmpi-serial.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_gnu.clm-FatesColdHydro.0124-200432de_gnu-params.nc
SMS_Lm6.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates.0124-200432de_int-params.nc
This should probably happen in a SystemTest, not in shell_commands
. See #2335.
See also discussion from CTSM SE meeting here.
I'm unable to get this to work on Izumi. It might be something wrong with my environment. Could someone else try this and see?
./run_sys_tests --skip-compare --skip-generate -t ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream
I'm unable to get this to work on Izumi. It might be something wrong with my environment. Could someone else try this and see?
./run_sys_tests --skip-compare --skip-generate -t ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream
I'm seeing a failure trying this as well with the following:
RUN: /scratch/cluster/glemieux/tests_0215-110541iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.0215-110541iz/shell_commands
FROM: /scratch/cluster/glemieux/tests_0215-110541iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.0215-110541iz
stat: 1
errput: ncgen: No such file or directory
(../../ncgen/genbin.c:58)
Traceback (most recent call last):
File "/home/glemieux/ctsm/src/fates/tools/modify_fates_paramfile.py", line 355, in <module>
main()
File "/home/glemieux/ctsm/src/fates/tools/modify_fates_paramfile.py", line 95, in main
shutil.copyfile(args.inputfname, tempfilename)
File "/cluster/anaconda-23.11.0/lib/python3.11/shutil.py", line 256, in copyfile
with open(src, 'rb') as fsrc:
^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/glemieux/ctsm/src/fates/parameter_files/binaries/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.0215-110541iz-params.nc'
Leaving broken case dir /scratch/cluster/glemieux/tests_0215-110541iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.0215-110541iz
Looking at the fates directory structure, the binaries
directory isn't getting built ~for some reason~ because ncgen
isn't found. @samsrabin is this the same error you were seeing?
That is indeed the same error I was seeing, but I don't think it's ncgen
not being found. I think there might be a problem with the ncgen
installation.
I'm actually getting a similar error on Derecho, too. From SMS_D_Ld3.f10_f10_mg37.I2000Clm50FatesRs.derecho_intel
:
Adding user mods directory /glade/u/home/samrabin/ctsm_fates-auto-params/cime_config/testdefs/testmods_dirs/clm/Fates
RUN: /glade/derecho/scratch/samrabin/tests_0215-114155de/SMS_D_Ld3.f10_f10_mg37.I2000Clm50FatesRs.derecho_intel.clm-Fates.0215-114155de/shell_commands
FROM: /glade/derecho/scratch/samrabin/tests_0215-114155de/SMS_D_Ld3.f10_f10_mg37.I2000Clm50FatesRs.derecho_intel.clm-Fates.0215-114155de
stat: 1
errput: ncgen: No such file or directory
(/home/conda/feedstock_root/build_artifacts/libnetcdf_1650908392318/work/ncgen/genbin.c:genbin_netcdf:63)
Tests in /glade/derecho/scratch/samrabin/tests_0215-114155de/
.
The issue is that ${SRCROOT}/src/fates/parameter_files/binaries/
doesn't exist. Will fix in my PR.
@samsrabin thanks for the work on this. Note, that fates/parameter_files is under the FATES external so adding a binary_files subdirectory would require a PR to FATES. And I think for git, you have to have at least a README file in the directory for it to show up when you check it out...
@ekluzek ~~Good point; adding this directory will make the FATES checkout unclean. @rgknox I think you need to make a new FATES tag that has an empty parameter_files/binaries/
directory, then update Externals.cfg here to point to that.~~ No; see below.
Wait, @ekluzek, even if the new directory is canonically in FATES, won't the checkout be unclean once the parameter file is generated?
Actually… the checkout looks clean. manage_externals/checkout_externals -S
gives no warning, and git status
in src/fates
is clean, even with the new directory and parameter files generated. This might be because .nc
files are ignored by src/fates/.gitignore
.
@samsrabin yes exactly. But, it's good that you showed that's the case. It's good to confirm.
That directory needs to be added in fates, sorry for that, I'll get it added to the next FATES PR. UPDATE: Sam added a mkdir -p call to the scripting, so we don't need this directory added anymore.
That merge commit I just did was to resolve conflicts introduced in my PR. They were only related to run_sys_tests.py and its testing. They're now resolved, and make all
in python/
is still clean.
clm_aux on derecho, ok with exception:
~~FAIL SMS.f10_f10_mg37.I2000Clm50BgcCrop.derecho_nvhpc.clm-crop MODEL_BUILD time=212~~ (THIS TEST PASSES AFTER RESUBMITTING)
FAIL DAE_C2_D_Lh12.f10_f10_mg37.I2000Clm50BgcCrop.derecho_intel.clm-DA_multidrv RUN time=304
ERROR: ERROR: Unrecognized line ('/bin/bash: module: line 1: syntax error: unexpected end of file
That dang DAE test! Try resubmitting it.
I tried the izumi test that was failing before and it works for me, so I checked that item off, which puts this in a ready to merge mode.
This test fails create case: ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream
RUN: /scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag/shell_commands
FROM: /scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag
stat: 1
errput: Traceback (most recent call last):
File "/home/rgknox/ctsm/src/fates/tools/modify_fates_paramfile.py", line 35, in <module>
from scipy.io import netcdf as nc
ImportError: No module named scipy.io
Leaving broken case dir /scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag
ERROR: Command: '/scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag/shell_commands' failed with error 'Traceback (most recent call last):
File "/home/rgknox/ctsm/src/fates/tools/modify_fates_paramfile.py", line 35, in <module>
from scipy.io import netcdf as nc
ImportError: No module named scipy.io' from dir '/scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag'
---------------------------------------------------
2024-02-26 13:08:10: CREATE_NEWCASE FAILED for test 'ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream'.
However, i'm able to load scipy manually when I run python. Also, this test passed when I ran it stand-alone.. ie, this test passed:
./create_test ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream
DAE_C2_D_Lh12.f10_f10_mg37.I2000Clm50BgcCrop.derecho_intel.clm-DA_multidrv also still fails after re-submitting
Since this isn't critical to come in now, we will plan on delaying this to fix the conda env issue on izumi (I think #2385 will fix this). @samsrabin also has some analysis that shows that there is a race condition for the DAE test that sometime results in a file being gzipped before something else needs I don't think the DAE issue should hold this one up, but that is another good thing to have come in.