Single Site Test Attempt on Casper (fails)
I attempted to build a single site test on casper and ran into some errors. I've never used this machine before today mind you, but I just heard that it is the ideal machine to run single site runs, so I gave it a quick test.
I have no special environment variables or anything other than the default modules loaded. I executed the create_test in an interactive queue using execcasper, and I also gave it one try on the login node (with the same error).
This uses the following tags:
ctsm: ctsm5.1.dev159
fates: sci.1.69.0_api.31.0.0
./create_test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold --generate /glade/derecho/scratch/rgknox/ctsm5.1.dev159-sci.1.69.0_api.31.0.0 --project P93300041 -o
Testnames: ['SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold']
create_test will do up to 1 tasks simultaneously
create_test will use up to 45 cores simultaneously
Creating test directory /glade/scratch/rgknox/SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold.G.20231215_101759_3lofzp
RUNNING TESTS:
SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold
Starting CREATE_NEWCASE for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold with 1 procs
Finished CREATE_NEWCASE for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold in 2.681000 seconds (PASS)
Starting XML for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold with 1 procs
Finished XML for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold in 0.331622 seconds (PASS)
Starting SETUP for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold with 1 procs
Finished SETUP for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold in 0.518615 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /glade/scratch/rgknox/SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold.G.20231215_101759_3lofzp
Errors were:
ERROR: module command /glade/u/apps/dav/opt/lmod/7.7.29/libexec/lmod python purge failed with message:
/glade/u/apps/dav/opt/lua/5.3.4/bin/lua: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory
Waiting for tests to finish
FAIL SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold (phase SETUP)
Case dir: /glade/scratch/rgknox/SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold.G.20231215_101759_3lofzp
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
test-scheduler took 4.831060886383057 seconds
@rgknox try the intel compiler does that fail as well?
yes, appears to be the same error as well:
ERROR: module command /glade/u/apps/dav/opt/lmod/7.7.29/libexec/lmod python purge failed with message: /glade/u/apps/dav/opt/lua/5.3.4/bin/lua: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or director
This is with ccs_config_cesm0.0.84. It looks like work on casper went into ccs_config_cesm0.0.87, so I'll try with that.
OK, that doesn't work out of the box. It might need a change in both ccs_config and in cime.
https://github.com/ESMCI/ccs_config_cesm/issues/138