E3SM
E3SM copied to clipboard
Build errors in kokkos on pm-cpu
I tried to build a E3SM-ChemUCI model on the perlmutter. I update the machine files and compiler files under $E3SM/cime_config/machines according to the v3atm branch below.
https://github.com/E3SM-Project/E3SM/tree/v3atm/eam/master_MAM5_wetaero/cime_config/machines
I met following error when building kokkos. Does anybody meet similar errors when building E3SM on pm-cpu? Any suggestions are welcome.
/global/u1/l/lix011/E3SM/externals/kokkos/core/src/Kokkos_Tuners.hpp:261:31: error: no member named 'sub_values' in 'Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, void>>'
size_t index = in.sub_values.size() * fraction_to_traverse;
~~ ^
/global/u1/l/lix011/E3SM/externals/kokkos/core/src/Kokkos_Tuners.hpp:276:18: note: in instantiation of member function 'Kokkos::Tools::Experimental::Impl::GetMultidimensionalPoint<Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, void>>, double, double>::build' requested here
return helper::build(in, std::get<Is>(indices).value.double_value...);
^
/global/u1/l/lix011/E3SM/externals/kokkos/core/src/Kokkos_Tuners.hpp:225:30: error: no member named 'root_values' in 'Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, void>>'
size_t index = dimension.root_values.size() * fraction_to_traverse;
~~~~~~~~~ ^
/global/u1/l/lix011/E3SM/externals/kokkos/core/src/Kokkos_Tuners.hpp:263:45: note: in instantiation of member function 'Kokkos::Tools::Experimental::Impl::DimensionValueExtractor<Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, void>>>::get' requested here
DimensionValueExtractor<node_type>::get(in, fraction_to_traverse));
^
/global/u1/l/lix011/E3SM/externals/kokkos/core/src/Kokkos_Tuners.hpp:276:18: note: in instantiation of member function 'Kokkos::Tools::Experimental::Impl::GetMultidimensionalPoint<Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, Kokkos::Tools::Experimental::Impl::ValueHierarchyNode<long, void>>, double, double>::build' requested here
return helper::build(in, std::get<Is>(indices).value.double_value...);
^
6 warnings and 2 errors generated.
gmake[2]: *** [core/src/CMakeFiles/kokkoscore.dir/build.make:93: core/src/CMakeFiles/kokkoscore.dir/impl/Kokkos_Core.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:956: core/src/CMakeFiles/kokkoscore.dir/all] Error 2
gmake: *** [Makefile:149: all] Error 2
I formatted the code block.
Why are you using this specific branch? Could you please retry with the master branch? You're using an outdated branch and the code of interest has all been merged into master now.
The specific branch for the chemistry purpose was recommended by @tangq who is the developer of this branch.
@lxu16 , I spoke with @ndkeen , who is visiting our lab today, about the background of your study. Noel understands why you need to run the simulations with the branch instead of master. My understanding is that Noel has some ideas that would work, such as using the configuration files on master or maint-2.0 with the branch, or GNU instead of Intel.
@tangq , thanks for helping clarify the modeling background behind this work.
For the error, is that with GNU or Intel compilers? I would think you should be able to use GNU compilers with your branch as this was working previously. Intel compiler was only added to maint-2.0/master recently as NERSC only installed them recently.
I used the intel compiler, cmake -DCMAKE_CXX_COMPILER=/opt/intel/oneapi/compiler/2023.1.0/linux/bin/icpx. I will try gnu compiler to see how it goes.
@ndkeen Could you direct me the sample machine files and compiler option files I could adopt in order to use the GNU compiler? Thanks!
What I'm saying is that GNU should work as-is for that branch. It would be the default compiler.
My current branch was adopted from the @tangq E3SM-ChemUCI_amip branch in Dec. 1. The commit is 5c1a8629027306b6da6a631b821654ccd29c444b.
https://github.com/E3SM-Project/E3SM/tree/tangq/atm/chemUCI_amip/cime_config/machines
This branch does not add the perlmutter machine files and compiler options yet.
To clarify, which branch are you talking about?
- in the main post, you mentioned this https://github.com/E3SM-Project/E3SM/tree/v3atm/eam/master_MAM5_wetaero which doesn't seem to be far from master (only 314 commits behind master), so you could easily just test on master instead
- in the last comment, you mentioned this https://github.com/E3SM-Project/E3SM/tree/tangq/atm/chemUCI_amip which is much older (12787 commits behind master) and thus it would make more sense to avoid moving to master
Can you clarify?
When I checkout tangq/atm/chemUCI_amip
, the machine files are very old. Indeed, before pm-cpu
or even perlmutter
was added as a machine. The branch still uses config_machines.xml which is closer to maint-1.0.
Overall, it may be much easier to see if you and/or Qi could put together a more recent branch ... even maint-2.0 doesn't currently work well with the intel compiler on pm-cpu without updating the scorpio modules, etc. --- so it's a hassle and much less performant :/
Sorry about the confusion. The tangq/atm/chemUCI_amip is the branch I used and I add some codes based on this version of Chem-UCI branch. It is dated back to Dec. 1, 2022.
When I checkout
tangq/atm/chemUCI_amip
, the machine files are very old. Indeed, beforepm-cpu
or evenperlmutter
was added as a machine. The branch still uses config_machines.xml which is closer to maint-1.0.
Which branch should I try to checkout and add changes to?
BTW, I downloaded the newest master. It is compilable successfully on pm-cpu using the intel compiler.
tangq/atm/chemUCI_amip, The commit I used is 5c1a8629027306b6da6a631b821654ccd29c444b. I believe that is the version very close to the one merged into the E3SMv2.
Which branch should I try to checkout and add changes to?
I saw similar kokkos errors after updating the compiler and machine related files I wonder if anyone succeed in running the maint-2.1 branch (i.e., E3SMv2.1) using intel compiler on perlmutter.
The maint-2.1 branch seems fine to me on pm-cpu. I tested with a few tests, including e3sm_production
.
@ndkeen Could you share the runscript for the standard EAM compset test for the maint-2.1? I want to see if it works for the fresh cloned maint-2.1. Thanks!
The maint-2.1 branch seems fine to me on pm-cpu. I tested with a few tests, including
e3sm_production
.
I think maint-2.1 will work on pm-cpu before and after my recent PR to make some adjustments.
To run a test:
cd cime/scripts
create_test SMS_Ln5.ne4pg2_oQU480.F2010
for example
Here is the dir where you can find all of the tests I tried with maint-2.1:
/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/m21up