Trilinos
Trilinos copied to clipboard
Intrepid2: tests failing in cuda build with "invalid device function" runtime errors
Bug Report
@trilinos/intrepid2
Description
What went wrong?
We see numerous Intrepid2 tests failing in a Trilinos build with cuda (version 11.6; without uvm):
239 - Intrepid2_unit-test_Discretization_FunctionSpaceTools_test_01_CUDA_DOUBLE_MPI_1 (Failed)
240 - Intrepid2_unit-test_Discretization_FunctionSpaceTools_test_02_CUDA_DOUBLE_MPI_1 (Failed)
241 - Intrepid2_unit-test_Discretization_FunctionSpaceTools_test_03_CUDA_DOUBLE_MPI_1 (Failed)
242 - Intrepid2_unit-test_Discretization_FunctionSpaceTools_test_04_CUDA_DOUBLE_MPI_1 (Failed)
...
574 - Intrepid2_unit-test_MonolithicExecutable_Intrepid2_Tests_MPI_1 (Failed)
576 - Intrepid2_unit-test_performance_StructuredIntegration_StructuredIntegrationPerformance_MPI_1 (Failed)
The error message typically looks like
terminate called after throwing an instance of 'std::runtime_error'
what(): cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) error( cudaErrorInvalidDeviceFunction): invalid device function /home/costmo-user/Trilinos-2/packages/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:659
Aborted (core dumped)
Do you have an idea what might fix things?
The issue appears to have something to do with how Intrepid2 defines Kokkos parallel regions with lambda's in nested templates, in combination with what seems surprising compiler behavior relevant to input parameters declared const
.
The issue appears to arise whenever there is a pattern of
- a class templated on the device type (e.g.
CellTools
), - with a member function that is itself templated (e.g.
getReferenceNode
), - with some of the input parameters of this member function declared
const
, - and with a Kokkos parallel region with a lambda defined in this nested templated member function.
The following code is a minimal reproducer that produces the issue in a call to the function getReferenceNode
from CellTools
in Intrepid2. Thus, this test is "non-working": running it throws in our build the exception listed above.
#include "Teuchos_UnitTestHarness.hpp"
#include "Intrepid2_ConfigDefs.hpp"
#include "Shards_CellTopology.hpp"
#include "Intrepid2_CellTools.hpp"
TEUCHOS_UNIT_TEST(Intrepid2Issue,CellTools)
{
using execution_space = Kokkos::DefaultExecutionSpace;
const shards::CellTopology cellType(shards::getCellTopologyData<shards::Line<2>>());
const Kokkos::DynRankView<double, execution_space> nodeCoords("Coords of nodes", 3);
Intrepid2::CellTools<execution_space>::getReferenceNode(
nodeCoords,
cellType,
0
);
}
The following code doesn't involve code from Intrepid2, but it creates a similar pattern as described above. In this second test, the same runtime error is produced in our build.
#include "Teuchos_UnitTestHarness.hpp"
#include "Kokkos_Core.hpp"
template <typename execution_space>
class A
{
public:
template<typename value_type>
static void foo(
Kokkos::View<value_type*, execution_space> view,
const value_type val
)
{
Kokkos::parallel_for(
Kokkos::RangePolicy<execution_space>(0,view.extent(0)),
KOKKOS_LAMBDA (const int &i) {view(i) = val;}
);
}
};
TEUCHOS_UNIT_TEST(Intrepid2Issue,Reproducer)
{
using execution_space = Kokkos::DefaultExecutionSpace;
const Kokkos::View<double*, execution_space> view("view", 3);
A<execution_space>::foo(view, 0.);
}
We found that the runtime errors disappear when we remove the const
qualifiers from the input parameters of the member function. Specifically:
- In the first test: by replacing
const ordinal_type nodeOrd
withordinal_type nodeOrd
inintrepid2/src/Cell/Intrepid2_CellToolsDefNodeInfo.hpp
and modifying accordingly the function declaration; - In the second test: by replacing
const value_type val
withvalue_type val
.
This is the cuda-gdb backtrace for the second test:
#0 0x00007f188b103672 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007f188b33dd1c in Kokkos::Impl::throw_runtime_exception (msg=...) at /home/costmo-user/Trilinos/packages/kokkos/core/src/impl/Kokkos_Error.cpp:72
#2 0x00007f188b357931 in Kokkos::Impl::cuda_internal_error_throw (e=cudaErrorInvalidDeviceFunction,
name=0x56228024a690 "cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func())",
file=0x56228024a648 "/opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp", line=659)
at /home/costmo-user/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:184
#3 0x0000562280242c3e in Kokkos::Impl::cuda_internal_safe_call (e=cudaErrorInvalidDeviceFunction,
name=0x56228024a690 "cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func())",
file=0x56228024a648 "/opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp", line=659) at /opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_Error.hpp:73
#4 0x0000562280246c9f in Kokkos::Impl::CudaParallelLaunchImpl<Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>, Kokkos::LaunchBounds<0u, 0u>, (Kokkos::Impl::Experimental::CudaLaunchMechanism)4>::get_cuda_func_attributes()::{lambda()#1}::operator()() const (this=0x7ffe78016317) at /opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:659
#5 0x0000562280246d5c in Kokkos::Impl::CudaParallelLaunchImpl<Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>, Kokkos::LaunchBounds<0u, 0u>, (Kokkos::Impl::Experimental::CudaLaunchMechanism)4>::get_cuda_func_attributes() () at /opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:663
#6 0x0000562280245dfa in Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>::execute() const (this=0x7ffe78016440) at /opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_Parallel.hpp:497
#7 0x000056228024506c in Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const> >(Kokkos::RangePolicy<Kokkos::Cuda> const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::enable_if<Kokkos::is_execution_policy<Kokkos::RangePolicy<Kokkos::Cuda> >::value, void>::type*) (
policy=..., functor=..., str=...) at /opt/Trilinos/GNU-Cuda/include/Kokkos_Parallel.hpp:169
#8 0x000056228024437f in A<Kokkos::Cuda>::foo<double> (view=..., val=0) at /home/costmo-user/HELM/tests/core/mesh/test_Shards.cpp:15
#9 0x0000562280241e02 in Intrepid2Issue_Reproducer_UnitTest::runUnitTestImpl (this=0x56228026a220 <instance_Intrepid2Issue_Reproducer_UnitTest>, out=...,
success=@0x7ffe7801664e: true) at /home/costmo-user/HELM/tests/core/mesh/test_Shards.cpp:28
#10 0x00007f188b5c9a77 in Teuchos::UnitTestBase::runUnitTest (this=0x56228026a220 <instance_Intrepid2Issue_Reproducer_UnitTest>, out=...)
at /home/costmo-user/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestBase.cpp:62
#11 0x00007f188b5cc778 in Teuchos::UnitTestRepository::runUnitTestImpl (unitTest=..., out=...)
at /home/costmo-user/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:539
#12 0x00007f188b5cadef in Teuchos::UnitTestRepository::runUnitTests (out=...) at /home/costmo-user/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:317
#13 0x00007f188b5cc049 in Teuchos::UnitTestRepository::runUnitTestsFromMain (argc=1, argv=0x7ffe78016fb8)
at /home/costmo-user/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:423
#14 0x00005622802497de in main (argc=1, argv=0x7ffe78016fb8) at /home/costmo-user/HELM/tests/TestBase.cpp:47
Line 5 seems particularly interesting. With some typesetting for readability, it looks like
Kokkos::Impl::CudaParallelLaunchImpl<
Kokkos::Impl::ParallelFor<
__nv_hdl_wrapper_t<
false,
false,
__nv_dl_tag<
void (*)(Kokkos::View<double*, Kokkos::Cuda>, double),
&(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)),
1u
>,
void (int const&),
Kokkos::View<double*, Kokkos::Cuda>, double const
>,
Kokkos::RangePolicy<Kokkos::Cuda>,
Kokkos::Cuda
>,
Kokkos::LaunchBounds<0u, 0u>,
(Kokkos::Impl::Experimental::CudaLaunchMechanism)4
>::get_cuda_func_attributes()
It seems like the compiler is stripping the const
qualifier from the member function. Which may be at the origin of why the function can't be found at run-time.
I found that the runtime errors also disappear when I remove the nested template structure.
Steps to Reproduce
There may be something particular to the way we set up our build, which may cause these issues. We use the following configure:
@trilinos/intrepid2 @CamelliaDPG
Likely related to this issue https://github.com/kokkos/kokkos/issues/5474 , seen when testing Trilinos with kokkos' develop branch
@maartenarnst Thank you for the very detailed report. It sounds like a compiler issue, and from the issue linked by @ndellingwood, it sounds like it affects more than just Intrepid2 (indeed, your second reproducer indicates as much). If it only affect cases where the signature involves something of the form const int_type value
-- where int_type
is a POD integer type, and it can be resolved by replacing this with int_type value
in the method signature, we certainly could consider making that interface change.
But for now, I suggest that we track kokkos/kokkos#5474, and see how that progresses. It may be that some small changes to your build settings could allow you to work around the issue. Does that sound OK to you?
Hi @CamelliaDPG. Thanks for your reaction! That certainly sounds good to track kokkos/kokkos#5474.
Closing this. This seemed to be an issue particular to cuda 11.6. It doesn’t appear anymore in 11.8 or 12.