Trilinos icon indicating copy to clipboard operation
Trilinos copied to clipboard

Intrepid2: tests failing in cuda build with "invalid device function" runtime errors

Open maartenarnst opened this issue 2 years ago • 4 comments

Bug Report

@trilinos/intrepid2

Description

What went wrong?

We see numerous Intrepid2 tests failing in a Trilinos build with cuda (version 11.6; without uvm):

  239 - Intrepid2_unit-test_Discretization_FunctionSpaceTools_test_01_CUDA_DOUBLE_MPI_1 (Failed)
  240 - Intrepid2_unit-test_Discretization_FunctionSpaceTools_test_02_CUDA_DOUBLE_MPI_1 (Failed)
  241 - Intrepid2_unit-test_Discretization_FunctionSpaceTools_test_03_CUDA_DOUBLE_MPI_1 (Failed)
  242 - Intrepid2_unit-test_Discretization_FunctionSpaceTools_test_04_CUDA_DOUBLE_MPI_1 (Failed)
...
  574 - Intrepid2_unit-test_MonolithicExecutable_Intrepid2_Tests_MPI_1 (Failed)
  576 - Intrepid2_unit-test_performance_StructuredIntegration_StructuredIntegrationPerformance_MPI_1 (Failed)

The error message typically looks like

terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func()) error( cudaErrorInvalidDeviceFunction): invalid device function /home/costmo-user/Trilinos-2/packages/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:659
Aborted (core dumped)

Do you have an idea what might fix things?

The issue appears to have something to do with how Intrepid2 defines Kokkos parallel regions with lambda's in nested templates, in combination with what seems surprising compiler behavior relevant to input parameters declared const.

The issue appears to arise whenever there is a pattern of

  • a class templated on the device type (e.g. CellTools),
  • with a member function that is itself templated (e.g. getReferenceNode),
  • with some of the input parameters of this member function declared const,
  • and with a Kokkos parallel region with a lambda defined in this nested templated member function.

The following code is a minimal reproducer that produces the issue in a call to the function getReferenceNode from CellTools in Intrepid2. Thus, this test is "non-working": running it throws in our build the exception listed above.

#include "Teuchos_UnitTestHarness.hpp"

#include "Intrepid2_ConfigDefs.hpp"

#include "Shards_CellTopology.hpp"
#include "Intrepid2_CellTools.hpp"

TEUCHOS_UNIT_TEST(Intrepid2Issue,CellTools)
{
    using execution_space = Kokkos::DefaultExecutionSpace;
    
    const shards::CellTopology cellType(shards::getCellTopologyData<shards::Line<2>>());
    
    const Kokkos::DynRankView<double, execution_space> nodeCoords("Coords of nodes", 3);

    Intrepid2::CellTools<execution_space>::getReferenceNode(
        nodeCoords,
        cellType,
        0
    );
}

The following code doesn't involve code from Intrepid2, but it creates a similar pattern as described above. In this second test, the same runtime error is produced in our build.

#include "Teuchos_UnitTestHarness.hpp"
  
#include "Kokkos_Core.hpp"

template <typename execution_space>
class A
{
public:
    template<typename value_type>
    static void foo(
              Kokkos::View<value_type*, execution_space> view,
        const value_type                                 val
    )
    {
        Kokkos::parallel_for(
            Kokkos::RangePolicy<execution_space>(0,view.extent(0)),
            KOKKOS_LAMBDA (const int &i) {view(i) = val;}
        );
    }
};

TEUCHOS_UNIT_TEST(Intrepid2Issue,Reproducer)
{
    using execution_space = Kokkos::DefaultExecutionSpace;
    
    const Kokkos::View<double*, execution_space> view("view", 3);

    A<execution_space>::foo(view, 0.);
}

We found that the runtime errors disappear when we remove the const qualifiers from the input parameters of the member function. Specifically:

  • In the first test: by replacing const ordinal_type nodeOrd with ordinal_type nodeOrd in intrepid2/src/Cell/Intrepid2_CellToolsDefNodeInfo.hpp and modifying accordingly the function declaration;
  • In the second test: by replacing const value_type val with value_type val.

This is the cuda-gdb backtrace for the second test:

#0  0x00007f188b103672 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007f188b33dd1c in Kokkos::Impl::throw_runtime_exception (msg=...) at /home/costmo-user/Trilinos/packages/kokkos/core/src/impl/Kokkos_Error.cpp:72
#2  0x00007f188b357931 in Kokkos::Impl::cuda_internal_error_throw (e=cudaErrorInvalidDeviceFunction, 
    name=0x56228024a690 "cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func())", 
    file=0x56228024a648 "/opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp", line=659)
    at /home/costmo-user/Trilinos/packages/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:184
#3  0x0000562280242c3e in Kokkos::Impl::cuda_internal_safe_call (e=cudaErrorInvalidDeviceFunction, 
    name=0x56228024a690 "cudaFuncGetAttributes(&attr_tmp, base_t::get_kernel_func())", 
    file=0x56228024a648 "/opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp", line=659) at /opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_Error.hpp:73
#4  0x0000562280246c9f in Kokkos::Impl::CudaParallelLaunchImpl<Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>, Kokkos::LaunchBounds<0u, 0u>, (Kokkos::Impl::Experimental::CudaLaunchMechanism)4>::get_cuda_func_attributes()::{lambda()#1}::operator()() const (this=0x7ffe78016317) at /opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:659
#5  0x0000562280246d5c in Kokkos::Impl::CudaParallelLaunchImpl<Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>, Kokkos::LaunchBounds<0u, 0u>, (Kokkos::Impl::Experimental::CudaLaunchMechanism)4>::get_cuda_func_attributes() () at /opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp:663
#6  0x0000562280245dfa in Kokkos::Impl::ParallelFor<__nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const>, Kokkos::RangePolicy<Kokkos::Cuda>, Kokkos::Cuda>::execute() const (this=0x7ffe78016440) at /opt/Trilinos/GNU-Cuda/include/Cuda/Kokkos_Cuda_Parallel.hpp:497
#7  0x000056228024506c in Kokkos::parallel_for<Kokkos::RangePolicy<Kokkos::Cuda>, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const> >(Kokkos::RangePolicy<Kokkos::Cuda> const&, __nv_hdl_wrapper_t<false, false, __nv_dl_tag<void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 1u>, void (int const&), Kokkos::View<double*, Kokkos::Cuda>, double const> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::enable_if<Kokkos::is_execution_policy<Kokkos::RangePolicy<Kokkos::Cuda> >::value, void>::type*) (
    policy=..., functor=..., str=...) at /opt/Trilinos/GNU-Cuda/include/Kokkos_Parallel.hpp:169
#8  0x000056228024437f in A<Kokkos::Cuda>::foo<double> (view=..., val=0) at /home/costmo-user/HELM/tests/core/mesh/test_Shards.cpp:15
#9  0x0000562280241e02 in Intrepid2Issue_Reproducer_UnitTest::runUnitTestImpl (this=0x56228026a220 <instance_Intrepid2Issue_Reproducer_UnitTest>, out=..., 
    success=@0x7ffe7801664e: true) at /home/costmo-user/HELM/tests/core/mesh/test_Shards.cpp:28
#10 0x00007f188b5c9a77 in Teuchos::UnitTestBase::runUnitTest (this=0x56228026a220 <instance_Intrepid2Issue_Reproducer_UnitTest>, out=...)
    at /home/costmo-user/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestBase.cpp:62
#11 0x00007f188b5cc778 in Teuchos::UnitTestRepository::runUnitTestImpl (unitTest=..., out=...)
    at /home/costmo-user/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:539
#12 0x00007f188b5cadef in Teuchos::UnitTestRepository::runUnitTests (out=...) at /home/costmo-user/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:317
#13 0x00007f188b5cc049 in Teuchos::UnitTestRepository::runUnitTestsFromMain (argc=1, argv=0x7ffe78016fb8)
    at /home/costmo-user/Trilinos/packages/teuchos/core/src/Teuchos_UnitTestRepository.cpp:423
#14 0x00005622802497de in main (argc=1, argv=0x7ffe78016fb8) at /home/costmo-user/HELM/tests/TestBase.cpp:47

Line 5 seems particularly interesting. With some typesetting for readability, it looks like

Kokkos::Impl::CudaParallelLaunchImpl<
    Kokkos::Impl::ParallelFor<
        __nv_hdl_wrapper_t<
            false, 
            false, 
            __nv_dl_tag<
                void (*)(Kokkos::View<double*, Kokkos::Cuda>, double), 
                &(void A<Kokkos::Cuda>::foo<double>(Kokkos::View<double*, Kokkos::Cuda>, double)), 
                1u
            >, 
            void (int const&), 
            Kokkos::View<double*, Kokkos::Cuda>, double const
        >, 
        Kokkos::RangePolicy<Kokkos::Cuda>, 
        Kokkos::Cuda
    >, 
    Kokkos::LaunchBounds<0u, 0u>, 
    (Kokkos::Impl::Experimental::CudaLaunchMechanism)4
>::get_cuda_func_attributes() 

It seems like the compiler is stripping the const qualifier from the member function. Which may be at the origin of why the function can't be found at run-time.

I found that the runtime errors also disappear when I remove the nested template structure.

Steps to Reproduce

There may be something particular to the way we set up our build, which may cause these issues. We use the following configure:

CMakePresets.json.txt

maartenarnst avatar Sep 24 '22 13:09 maartenarnst

@trilinos/intrepid2 @CamelliaDPG

jhux2 avatar Sep 26 '22 20:09 jhux2

Likely related to this issue https://github.com/kokkos/kokkos/issues/5474 , seen when testing Trilinos with kokkos' develop branch

ndellingwood avatar Sep 27 '22 00:09 ndellingwood

@maartenarnst Thank you for the very detailed report. It sounds like a compiler issue, and from the issue linked by @ndellingwood, it sounds like it affects more than just Intrepid2 (indeed, your second reproducer indicates as much). If it only affect cases where the signature involves something of the form const int_type value -- where int_type is a POD integer type, and it can be resolved by replacing this with int_type value in the method signature, we certainly could consider making that interface change.

But for now, I suggest that we track kokkos/kokkos#5474, and see how that progresses. It may be that some small changes to your build settings could allow you to work around the issue. Does that sound OK to you?

CamelliaDPG avatar Sep 27 '22 15:09 CamelliaDPG

Hi @CamelliaDPG. Thanks for your reaction! That certainly sounds good to track kokkos/kokkos#5474.

maartenarnst avatar Sep 27 '22 18:09 maartenarnst

Closing this. This seemed to be an issue particular to cuda 11.6. It doesn’t appear anymore in 11.8 or 12.

maartenarnst avatar Dec 21 '22 06:12 maartenarnst