E3SM
E3SM copied to clipboard
On pm-cpu, error at setup/build when e3sm-unified activated
I've now had a couple of users report issues with E3SM on pm-cpu that were resolved after unloading the nco module. I should explicitly remove that module from the list for this machine to avoid others hitting same issue.
"Can't locate XML/LibXML.pm in @INC (you may need to install the XML::LibXML module)
BEGIN failed--compilation aborted at /global/u1/x/xsongsio/maint-2.0/E3SM/cime/utils/perl5lib/Config/SetupTools.pm line 5.
ERROR: Command: '/global/u1/x/xsongsio/maint-2.0/E3SM/components/elm/bld/build-namelist"
Will need to update all maint branches as well.
Actually, I see that nco is NOT a module on pm-cpu, but instead it is a spack package. So avoiding this by simply removing a module is not possible. We may have to assume that the user is not altering environment -- ie not sure how best to guard against things such as spack installed packages.
This is most likely coming from E3SM-Unified. Users should not be loading E3SM-Unified in the same terminal where they build E3SM but we could do:
mamba deactivate conda deactivate spack env deactivate
adjacent to the module rm commands to try to be sure.
I just merged https://github.com/E3SM-Project/E3SM/pull/6003 to master, which will remove the climate-utils
module (if loaded) on NERSC machines before building. This was also known to cause errors noted in first comment.
Noting another user created NERSC ticket that hit this error -- ie trying to build e3sm with e3sm_unified activated.
Note this issue came up during E3SM Tutorial at NERSC as users were going back-n-forth between e3sm_unified examples and running E3SM. It seems not a huge deal, but a) be nice to find a way to issue error that is instructive b) might be easy fix to allow it to work anyway -- though may be other issues and we not worth trouble
I think the more informative error would have to come on the CIME side. E3SM-Unified sets various environment variables when it gets activated and these could be detected. Or the conda environment itself could be detected.
I don't think the current perl conflict (related to xml library version) could easily be fixed, nor do I think that would be the only issue. E3SM and CIME expect a pretty clean environment, whereas E3SM-Unified is very complex by design.
We could experiment with having CIME automatically deactivate E3SM-Unified, but I hesitate to have anything so complicated get hidden under the hood. If something goes wrong in that process, it might be just as bad as the obscure error message we currently have.
Yep I agree with all that
I had a devious idea last night that just might work. Passed it by Xylar and Rob as well. Hijack an environment variable that only E3SM (really CIME) cares about and then have e3sm_unified set it to something verbose to let the user know the issue. Then if e3sm_unified activated, any attempt to create a case will fail right away.
setenv CIME_MODEL ENVIRONMENT_RUNNING_E3SM_UNIFIED_USE_ANOTHER_TERMINAL
(e3sm_unified_1.10.0_login) perlmutter-login21% create_test SMS_D.ne4pg2_oQU480.F2010 --compiler=gnu
ERROR: model ENVIRONMENT_RUNNING_E3SM_UNIFIED_USE_ANOTHER_TERMINAL not recognized. The acceptable values of CIME_MODEL currently are ['ufs', 'cesm', 'e3sm']
Great idea! I implemented this in https://github.com/E3SM-Project/e3sm-unified/pull/119
That won't be useful until there's a new Unified, though.