E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

Kokkos::Impl::ParallelReduce< HIP > requested too large team size

Open trey-ornl opened this issue 1 year ago • 0 comments

I'm experimenting with stand-alone Homme on Frontier with Rocm 5.7.1 and 128 vertical levels, and my runs are failing with the following output.

Kokkos::Impl::ParallelReduce< HIP > requested too large team size

The core points to this line: https://github.com/E3SM-Project/E3SM/blob/fff7243869f58856906c50d276023237ccd8a140/components/homme/src/theta-l_kokkos/cxx/CaarFunctorImpl.hpp#L350

I added some debug output, and I found that m_policy_pre has a team_size() of 16 and a impl_vector_length() of 64, or a total of 1024 threads. That value is indeed too big for the definition of m_policy_pre:

#ifndef NDEBUG
  template<typename Tag>
  using TeamPolicyType = Kokkos::TeamPolicy<ExecSpace,Kokkos::LaunchBounds<512,1>,Tag>;
#else
  template<typename Tag>
  using TeamPolicyType = Kokkos::TeamPolicy<ExecSpace,Tag>;
#endif

  TeamPolicyType<TagPreExchange>   m_policy_pre;

Notice the Kokkos::LaunchBounds<512,1>.

I don't know why this is only showing up now. Maybe a newer version of Kokkos or Rocm checks these settings more carefully? Regardless, I think we want to allow m_policy_pre to have 1024 threads (4x4x64), so I think Kokkos::LaunchBounds<512,1> should not be used on AMD GPUs, where warps are 64 instead of 32.

trey-ornl avatar Jul 12 '24 23:07 trey-ornl