aspect icon indicating copy to clipboard operation
aspect copied to clipboard

tests/gmg_mesh_deform fails with deal.II master

Open tjhei opened this issue 2 years ago • 27 comments

with a floating point exception, see https://github.com/geodynamics/aspect/pull/4902#event-6973011711 for example

tjhei avatar Jul 11 '22 22:07 tjhei

Confirmed. This is the backtrace:

Thread 1 "aspect" received signal SIGFPE, Arithmetic exception.
dealii::VectorizedArray<double, 2ul>::operator+= (vec=..., this=0x55555af7a340) at /home/bangerth/p/deal.II/1/install/include/deal.II/base/vectorization.h:3341
3341        data += vec.data;
(gdb) bt
#0  dealii::VectorizedArray<double, 2ul>::operator+= (vec=..., this=0x55555af7a340) at /home/bangerth/p/deal.II/1/install/include/deal.II/base/vectorization.h:3341
#1  dealii::internal::EvaluatorTensorProduct<(dealii::internal::EvaluatorVariant)2, 3, 2, 2, dealii::VectorizedArray<double, 2ul>, dealii::VectorizedArray<double, 2ul> >::apply<1, false, true, 1, false> (shapes=0x55555ac62580, in=0x55555af7a640, out=0x55555af7a340)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/tensor_product_kernels.h:1876
#2  0x0000555558f6cb6c in dealii::internal::EvaluatorTensorProduct<(dealii::internal::EvaluatorVariant)2, 3, 2, 2, dealii::VectorizedArray<double, 2ul>, dealii::VectorizedArray<double, 2ul> >::gradients<1, false, true> (this=0x7ffffffefef0, in=0x55555af7a640, out=0x55555af7a340)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/tensor_product_kernels.h:1679
#3  0x0000555558f4d966 in dealii::internal::FEEvaluationImplCollocation<3, 1, dealii::VectorizedArray<double, 2ul> >::do_integrate (shape=..., 
    integration_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555af7a340, gradients_quad=0x55555af7a5c0, hessians_quad=0x55555af7b040, 
    add_into_values_array=false) at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/evaluation_kernels.h:2089
#4  0x0000555558f2bb24 in dealii::internal::FEEvaluationImplTransformToCollocation<3, 1, 2, dealii::VectorizedArray<double, 2ul> >::integrate (n_components=3, 
    integration_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555af7a140, fe_eval=..., add_into_values_array=false)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/evaluation_kernels.h:2206
#5  0x0000555558f13af5 in dealii::internal::FEEvaluationImplIntegrateSelector<3, dealii::VectorizedArray<double, 2ul> >::run<1, 2> (n_components=3, 
    integration_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555af7a140, fe_eval=..., sum_into_values_array=false)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/evaluation_kernels.h:2435
#6  0x0000555558edd350 in dealii::SelectEvaluator<3, 1, 2, dealii::VectorizedArray<double, 2ul> >::integrate (n_components=3, 
    integration_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555af7a140, eval=..., sum_into_values_array=false)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/evaluation_selector.h:98
#7  0x0000555558ea6d83 in dealii::FEEvaluation<3, 1, 2, 3, double, dealii::VectorizedArray<double, 2ul> >::integrate (this=0x7fffffff3820, 
    integration_flag=dealii::EvaluationFlags::gradients, values_array=0x55555af7a140, sum_into_values_array=false)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/fe_evaluation.h:7910
#8  0x0000555558e86bac in dealii::FEEvaluation<3, 1, 2, 3, double, dealii::VectorizedArray<double, 2ul> >::integrate (this=0x7fffffff3820, 
    integration_flag=dealii::EvaluationFlags::gradients) at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/fe_evaluation.h:7817
#9  0x0000555558efec8a in dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::do_operation_on_cell (this=0x55555b1cbc30, phi=..., cell=0)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/operators.h:2307

bangerth avatar Jul 11 '22 22:07 bangerth

This is outside my area of expertise, so I can not really help with what is going on. But this is what I find:

(gdb) list
3336      DEAL_II_ALWAYS_INLINE
3337      VectorizedArray &
3338      operator+=(const VectorizedArray &vec)
3339      {
3340    #    ifdef DEAL_II_COMPILER_USE_VECTOR_ARITHMETICS
3341        data += vec.data;                      *********************************************** error happens here
3342    #    else
3343        data = _mm_add_pd(data, vec.data);
3344    #    endif
3345        return *this;
(gdb) p vec.data
value has been optimized out
(gdb) p vec
$1 = <optimized out>
(gdb) up
#1  dealii::internal::EvaluatorTensorProduct<(dealii::internal::EvaluatorVariant)2, 3, 2, 2, dealii::VectorizedArray<double, 2ul>, dealii::VectorizedArray<double, 2ul> >::apply<1, false, true, 1, false> (shapes=0x55555ac62580, in=0x55555af7a640, out=0x55555af7a340)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/tensor_product_kernels.h:1876
1876                        out[stride * col] += r0 + r1;
(gdb) p r0
$2 = {
  <dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 2>, 2>> = {<No data fields>}, 
  members of dealii::VectorizedArray<double, 2>:
  data = {-3.9405883313765773e+264, -inf}
}
(gdb) p r1
$3 = {
  <dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 2>, 2>> = {<No data fields>}, 
  members of dealii::VectorizedArray<double, 2>:
  data = {0, 0}
}

I don't know why r0 has the infinity in it.

bangerth avatar Jul 11 '22 22:07 bangerth

Thanks. I will take a look. I could not reproduce on my system sorry this is helpful.

tjhei avatar Jul 12 '22 01:07 tjhei

Can you share the rest of the backtrace?

tjhei avatar Jul 12 '22 01:07 tjhei

Here is the whole backtrace:

#0  dealii::VectorizedArray<double, 2ul>::operator+= (vec=..., this=0x55555af7a340) at /home/bangerth/p/deal.II/1/install/include/deal.II/base/vectorization.h:3341
#1  dealii::internal::EvaluatorTensorProduct<(dealii::internal::EvaluatorVariant)2, 3, 2, 2, dealii::VectorizedArray<double, 2ul>, dealii::VectorizedArray<double, 2ul> >::apply<1, false, true, 1, false> (shapes=0x55555ac62580, in=0x55555af7a640, out=0x55555af7a340)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/tensor_product_kernels.h:1876
#2  0x0000555558f6cb6c in dealii::internal::EvaluatorTensorProduct<(dealii::internal::EvaluatorVariant)2, 3, 2, 2, dealii::VectorizedArray<double, 2ul>, dealii::VectorizedArray<double, 2ul> >::gradients<1, false, true> (this=0x7ffffffefef0, in=0x55555af7a640, out=0x55555af7a340)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/tensor_product_kernels.h:1679
#3  0x0000555558f4d966 in dealii::internal::FEEvaluationImplCollocation<3, 1, dealii::VectorizedArray<double, 2ul> >::do_integrate (shape=..., 
    integration_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555af7a340, gradients_quad=0x55555af7a5c0, hessians_quad=0x55555af7b040, 
    add_into_values_array=false) at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/evaluation_kernels.h:2089
#4  0x0000555558f2bb24 in dealii::internal::FEEvaluationImplTransformToCollocation<3, 1, 2, dealii::VectorizedArray<double, 2ul> >::integrate (n_components=3, 
    integration_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555af7a140, fe_eval=..., add_into_values_array=false)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/evaluation_kernels.h:2206
#5  0x0000555558f13af5 in dealii::internal::FEEvaluationImplIntegrateSelector<3, dealii::VectorizedArray<double, 2ul> >::run<1, 2> (n_components=3, 
    integration_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555af7a140, fe_eval=..., sum_into_values_array=false)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/evaluation_kernels.h:2435
#6  0x0000555558edd350 in dealii::SelectEvaluator<3, 1, 2, dealii::VectorizedArray<double, 2ul> >::integrate (n_components=3, 
    integration_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555af7a140, eval=..., sum_into_values_array=false)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/evaluation_selector.h:98
#7  0x0000555558ea6d83 in dealii::FEEvaluation<3, 1, 2, 3, double, dealii::VectorizedArray<double, 2ul> >::integrate (this=0x7fffffff3820, 
    integration_flag=dealii::EvaluationFlags::gradients, values_array=0x55555af7a140, sum_into_values_array=false)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/fe_evaluation.h:7910
#8  0x0000555558e86bac in dealii::FEEvaluation<3, 1, 2, 3, double, dealii::VectorizedArray<double, 2ul> >::integrate (this=0x7fffffff3820, 
    integration_flag=dealii::EvaluationFlags::gradients) at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/fe_evaluation.h:7817
#9  0x0000555558efec8a in dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::do_operation_on_cell (this=0x55555b1cbc30, phi=..., cell=0)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/operators.h:2307
#10 0x0000555558eab085 in dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::local_diagonal_cell (this=0x55555b1cbc30, data=warning: RTTI symbol not found for class 'dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >'
..., dst=..., cell_range={...})
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/operators.h:2384
#11 0x0000555558fbeb20 in dealii::internal::MFWorker<dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >, true>::process_range (this=0x7fffffff3fd0, fu=
    @0x7fffffff3fe8: (void (dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2> >::*)(const dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2> > * const, const dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2> > &, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> &, const dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> &, const std::pair<unsigned int, unsigned int> &)) 0x555558eaaee2 <dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::local_diagonal_cell(dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> > const&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&, std::pair<unsigned int, unsigned int> const&) const>, 
    ptr=std::vector of length 3, capacity 4 = {...}, data=std::vector of length 2, capacity 6 = {...}, range_index=0)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/matrix_free.h:4470
#12 0x0000555558fb8e71 in dealii::internal::MFWorker<dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >, true>::cell (this=0x7fffffff3fd0, range_index=0)
    at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/matrix_free.h:4432
#13 0x00007ffff574415f in dealii::internal::MatrixFreeFunctions::TaskInfo::loop (this=0x55555b0f7b80, funct=warning: RTTI symbol not found for class 'dealii::internal::MFWorker<dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >, true>'
...)
    at /home/bangerth/p/deal.II/1/dealii/source/matrix_free/task_info.cc:623
#14 0x0000555558eab3d6 in dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> >::cell_loop<dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> > (this=0x55555b0f7900, function_pointer=
    (void (dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2> >::*)(const dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2> > * const, const dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2> > &, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> &, const dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> &, const std::pair<unsigned int, unsigned int> &)) 0x555558eaaee2 <dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::local_diagonal_cell(dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 2ul> > const&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&, std::pair<unsigned int, unsigned int> const&) const>, 
    owning_class=0x55555b1cbc30, dst=..., src=..., zero_dst_vector=false) at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/matrix_free.h:4844
#15 0x0000555558e87d06 in dealii::MatrixFreeOperators::LaplaceOperator<3, 1, 2, 3, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::compute_diagonal (this=0x55555b1cbc30) at /home/bangerth/p/deal.II/1/install/include/deal.II/matrix_free/operators.h:2191
#16 0x0000555558e6e109 in aspect::MeshDeformation::MeshDeformationHandler<3>::compute_mesh_displacements_gmg (this=0x55555abb48a0)
    at /home/bangerth/p/deal.II/1/projects/aspect/source/mesh_deformation/interface.cc:1134
#17 0x0000555558e5d8eb in aspect::MeshDeformation::MeshDeformationHandler<3>::setup_dofs (this=0x55555abb48a0)
    at /home/bangerth/p/deal.II/1/projects/aspect/source/mesh_deformation/interface.cc:1414
#18 0x000055555a37d47e in aspect::Simulator<3>::setup_dofs (this=0x7fffffff9e30) at /home/bangerth/p/deal.II/1/projects/aspect/source/simulator/core.cc:1446
#19 0x000055555a37c827 in aspect::Simulator<3>::run (this=0x7fffffff9e30) at /home/bangerth/p/deal.II/1/projects/aspect/source/simulator/core.cc:1952
#20 0x0000555558a66afb in run_simulator<3> (
    raw_input_as_string="# Test GMG with initial mesh deformation\n#\n# This test is a copy of\n\ninclude $ASPECT_SOURCE_DIR/tests/box_initial_mesh_deformation_ascii_data.prm\n\nset Dimension = 3\n\nsubsection Solver parameters\n  sub"..., 
    input_as_string="# Test GMG with initial mesh deformation\n#\n# This test is a copy of\n\ninclude /home/bangerth/p/deal.II/1/projects/aspect/tests/box_initial_mesh_deformation_ascii_data.prm\n\nset Dimension = 3\n\nsubsection"..., output_xml=false, output_plugin_graph=false, validate_only=false)
    at /home/bangerth/p/deal.II/1/projects/aspect/source/main.cc:598
#21 0x0000555558a39562 in main (argc=2, argv=0x7fffffffdd68) at /home/bangerth/p/deal.II/1/projects/aspect/source/main.cc:790

bangerth avatar Jul 12 '22 02:07 bangerth

@zjiaqi2018 We are getting this error in compute_mesh_displacements_gmg() but only with a recent deal.II master. Any idea why? Does it crash on your system as well?

tjhei avatar Jul 12 '22 09:07 tjhei

@peterrum Matrix free started causing this segfault in the last 3 days or so. Do you happen to have an idea what changed inside deal.II?

tjhei avatar Jul 12 '22 15:07 tjhei

My guess is that one of the following PRs caused the problem:

  • https://github.com/dealii/dealii/pull/14085
  • https://github.com/dealii/dealii/pull/14090
  • https://github.com/dealii/dealii/pull/14119

According to the date ("3 days"), the last one. My guess is that one of the PRs had the side effect that some of the lanes are not filled. Could anyone point out which one is the guilty one?

peterrum avatar Jul 12 '22 16:07 peterrum

updated deal.ii master, but I could not reproduce the error.

zjiaqi2018 avatar Jul 12 '22 17:07 zjiaqi2018

Please ping me once you have a commit as suspect. The aspect messages might get lost between other emails from github.

peterrum avatar Jul 13 '22 07:07 peterrum

It is not from the last few days (sorry, that was a wrong assumption, we just started testing all tests in the last couple of days/weeks). Here is what git bisect gave me:

$ git bisect bad
d361c9cee84e39b3b519971790afa1a39d4bbb0c is the first bad commit
commit d361c9cee84e39b3b519971790afa1a39d4bbb0c
Author: Martin Kronbichler <[email protected]>
Date:   Tue Jun 7 23:16:58 2022 +0200

    Avoid saving and setting refinement flags in distributed Tria

 source/distributed/tria.cc | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

tjhei avatar Jul 16 '22 20:07 tjhei

And this doesn't make much sense to me, sadly. With this diff

diff --git a/source/distributed/tria.cc b/source/distributed/tria.cc
index 41ed862418..d0c8978f12 100644
--- a/source/distributed/tria.cc
+++ b/source/distributed/tria.cc
@@ -2738,17 +2738,24 @@ namespace parallel
     bool
     Triangulation<dim, spacedim>::prepare_coarsening_and_refinement()
     {
-      bool         mesh_changed = false;
-      unsigned int loop_counter = 0;
-      unsigned int n_changes    = 0;
+      bool              mesh_changed = false;
+      unsigned int      loop_counter = 0;
+      unsigned int      n_changes    = 0;
+      std::vector<bool> flags_before[2];
+      this->save_coarsen_flags(flags_before[0]);
+      this->save_refine_flags(flags_before[1]);
+
       do
         {
+          std::cout << "1. n_changes= " << n_changes << std::endl;
           n_changes += this->dealii::Triangulation<dim, spacedim>::
                          prepare_coarsening_and_refinement();
+          std::cout << "2. n_changes= " << n_changes << std::endl;
           this->update_periodic_face_map();
           // enforce 2:1 mesh balance over periodic boundaries
           mesh_changed = enforce_mesh_balance_over_periodic_boundaries(*this);
           n_changes += mesh_changed;
+          std::cout << "3. n_changes= " << n_changes << std::endl;
 
           // We can't be sure that we won't run into a situation where we can
           // not reconcile mesh smoothing and balancing of periodic faces. As
@@ -2764,6 +2771,13 @@ namespace parallel
         }
       while (mesh_changed);
 
+      std::vector<bool> flags_after[2];
+      this->save_coarsen_flags(flags_after[0]);
+      this->save_refine_flags(flags_after[1]);
+      bool bla = ((flags_before[0] != flags_after[0]) ||
+                  (flags_before[1] != flags_after[1]));
+      std::cout << "bla= " << bla << " n_changes=" << n_changes << std::endl;
+
       // report if we observed changes in any of the sub-functions
       return n_changes > 0;
     }

I get

Vectorization over 4 doubles = 256 bits (AVX), VECTORIZATION_LEVEL=2                                                
1. n_changes= 0                                                                                                     
2. n_changes= 0                                                                                                     
3. n_changes= 0                                                                                                     
bla= 0 n_changes=0                                                                                                  
-----------------------------------------------------------------------------                                       
-- For information on how to cite ASPECT, see:                                                                      
--   https://aspect.geodynamics.org/citing.html?ver=2.5.0-pre&mf=1&sha=6ca85efad&src=code                           
-----------------------------------------------------------------------------                                       
1. n_changes= 0                                                                                                     
2. n_changes= 0                                                                                                     
3. n_changes= 0                                                                                                     
bla= 0 n_changes=0                                                                                                  
1. n_changes= 0                                                                                                     
2. n_changes= 0                                                                                                     
3. n_changes= 0                                                                                                     
bla= 0 n_changes=0
1. n_changes= 0
2. n_changes= 0
3. n_changes= 0
bla= 0 n_changes=0
Number of active cells: 8 (on 2 levels)
Number of degrees of freedom: 527 (375+27+125)

Number of mesh deformation degrees of freedom: 81
Floating point exception (core dumped)

tjhei avatar Jul 16 '22 20:07 tjhei

valgrind finds something:

==3952306== Invalid write of size 8                       
==3952306==    at 0x130CB538: std::__detail::_List_node_header::_M_init() (stl_list.h:151)                          
==3952306==    by 0x130CB51F: std::__detail::_List_node_header::_List_node_header() (stl_list.h:110)                
==3952306==    by 0x21F15D4B: std::__cxx11::_List_base<std::pair<bool, dealii::AlignedVector<double> >, std::allocat
or<std::pair<bool, dealii::AlignedVector<double> > > >::_List_impl::_List_impl() (stl_list.h:377)                   
==3952306==    by 0x21EF9CF3: std::__cxx11::_List_base<std::pair<bool, dealii::AlignedVector<double> >, std::allocat
or<std::pair<bool, dealii::AlignedVector<double> > > >::_List_base() (stl_list.h:456)                               
==3952306==    by 0x21EF9D13: std::__cxx11::list<std::pair<bool, dealii::AlignedVector<double> >, std::allocator<std
::pair<bool, dealii::AlignedVector<double> > > >::list() (stl_list.h:669)                                           
==3952306==    by 0x21ED62BC: dealii::MatrixFree<3, double, dealii::VectorizedArray<double, 4ul> >::MatrixFree() (ma
trix_free.templates.h:85)                                                                                           
==3952306==    by 0x39FDC57: aspect::MeshDeformation::MeshDeformationHandler<3>::compute_mesh_displacements_gmg() (i
nterface.cc:964)                                                                                                    
==3952306==    by 0x39F83BD: aspect::MeshDeformation::MeshDeformationHandler<3>::setup_dofs() (interface.cc:1414)   
==3952306==    by 0x256C39A: aspect::Simulator<3>::setup_dofs() (core.cc:1448)                                      
==3952306==    by 0x256B732: aspect::Simulator<3>::run() (core.cc:1954)                                             
==3952306==    by 0x3D69114: void run_simulator<3>(std::__cxx11::basic_string<char, std::char_traits<char>, std::all
ocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool,
 bool, bool) (main.cc:598)
==3952306==    by 0x3D1F1EB: main (main.cc:790)
==3952306==  Address 0x347f67d8 is 0 bytes after a block of size 1,592 alloc'd
==3952306==    at 0x571EE63: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-a
md64-linux.so)
==3952306==    by 0x39FDC45: aspect::MeshDeformation::MeshDeformationHandler<3>::compute_mesh_displacements_gmg() (i
nterface.cc:964)
==3952306==    by 0x39F83BD: aspect::MeshDeformation::MeshDeformationHandler<3>::setup_dofs() (interface.cc:1414)
==3952306==    by 0x256C39A: aspect::Simulator<3>::setup_dofs() (core.cc:1448)
==3952306==    by 0x256B732: aspect::Simulator<3>::run() (core.cc:1954)
==3952306==    by 0x3D69114: void run_simulator<3>(std::__cxx11::basic_string<char, std::char_traits<char>, std::all
ocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool,
 bool, bool) (main.cc:598)
==3952306==    by 0x3D1F1EB: main (main.cc:790)
=

tjhei avatar Jul 17 '22 13:07 tjhei

I haven't figured out what the exact problem is, so I am going back to the FPE. Does this make any sense @kronbichler @peterrum : /home/heister/deal-git/include/deal.II/matrix_free/evaluation_kernels.h:2210

fe_eval.begin_gradients() + c * dim * n_q_points,

c=1, dim=3, n_q_points=8, so fe_eval.begin_gradients()+24: But the first 25 entries of fe_eval.begin_gradients() are:

  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {6875.0000000000018, 6875.0000000000018, 6875.0000000000018, 6875.0000000000018}},
  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {25657.849302036037, 25657.849302036037, 25657.849302036037, 25657.849302036037}},
  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {6875.0000000000018, 6875.0000000000018, 6875.0000000000018, 6875.0000000000018}},
  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {25657.849302036037, 25657.849302036037, 25657.849302036037, 25657.849302036037}},
 ....
  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {-1842.1506979639698, -1842.1506979639698, -1842.1506979639698, -1842.1506979639698}},
  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {-6875.0000000000018, -6875.0000000000018, -6875.0000000000018, -6875.0000000000018}},
  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {-6875.0000000000018, -6875.0000000000018, -6875.0000000000018, -6875.0000000000018}},
  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {-25657.849302036037, -25657.849302036037, -25657.849302036037, -25657.849302036037}},
  {<dealii::VectorizedArrayBase<dealii::VectorizedArray<double, 4>, 4>> = {<No data fields>}, data = {inf, 4.6585704078368084e+296, 5.3292598126531783e+272, 7.8246293389290778e+233}}}

The last one is clearly garbage.

tjhei avatar Jul 18 '22 20:07 tjhei

We have dim=3, n_q_points_1d=2, n_components=3. That should give us 72 entries in fe_eval.begin_gradients() with good data. The fact that the 25th entry is wrong must mean that the function that writes to this data, which is either https://github.com/dealii/dealii/blob/9a5cb94940efef067756d420e356c23335872008/include/deal.II/matrix_free/fe_evaluation.h#L4989-L4990 or https://github.com/dealii/dealii/blob/9a5cb94940efef067756d420e356c23335872008/include/deal.II/matrix_free/fe_evaluation.h#L4968-L4970 (non-Cartesian or Cartesian mesh case), is not doing what it is supposed to do for comp=1. Alternatively, some other code could overwrite the content of that array with invalid data: We are not super protective regarding the pointers and overlapping arrays in that part of the code in the sense that we have one large array allocated somewhere, with several arrays for values, gradients, etc getting different slices of that array, so if any other function is writing out of bounds it will not be detected by memory checkers as long as it happens within that one large piece. The reason for that code is performance.

The strange thing is that I still do get is why we only see this in ASPECT and have a strange bisect result.

kronbichler avatar Jul 18 '22 21:07 kronbichler

This is the mesh: image

The FPE happens when we solve the initial vector Laplace with a MappingQ1Eulerian with a displacement vector that is all 0.

tjhei avatar Jul 20 '22 19:07 tjhei

I can no longer reproduce the problem on my machine and the tester also seems to be happy with the latest deal.II master. I am not sure what fixed the problem: https://github.com/dealii/dealii/pull/14154 https://github.com/dealii/dealii/pull/14153

tjhei avatar Jul 21 '22 16:07 tjhei

Just to be sure, @bangerth can you try with an updated deal.II as well?

tjhei avatar Jul 21 '22 16:07 tjhei

Yes, it works now! Yay!

bangerth avatar Jul 21 '22 20:07 bangerth

It looks like we are back to failing with deal.II master:

The following tests FAILED:
	210 - crameri_benchmark_1_gmg (Failed)
	290 - free_surface_VE_cylinder_2D_loading_fixed_elastic_dt_gmg (Failed)
	293 - free_surface_blob_gmg (Failed)
	298 - free_surface_iterated_stokes_gmg (Failed)
	316 - gmg_mesh_deform (Failed)
	317 - gmg_mesh_deform_adaptive (Failed)
	318 - gmg_mesh_deform_adaptive_bug (Failed)
	319 - gmg_mesh_deform_adaptive_bug2 (Failed)
	320 - gmg_mesh_deform_function (Failed)
	321 - gmg_mesh_deform_ghost_entries (Failed)
	322 - gmg_mesh_deform_prescribed (Failed)
	323 - gmg_mesh_deform_topo (Failed)

:-(

tjhei avatar Aug 05 '22 03:08 tjhei

Backtrace seems to be the same:

#0  dealii::VectorizedArray<double, 2ul>::operator+= (vec=..., this=<optimized out>)
    at /ssd/deal-git7/include/deal.II/base/vectorization.h:3339
#1  dealii::operator+<double, 2ul> (v=..., u=...) at /ssd/deal-git7/include/deal.II/base/vectorization.h:4730
#2  dealii::internal::EvaluatorTensorProduct<(dealii::internal::EvaluatorVariant)2, 2, 2, 2, dealii::VectorizedArray<double, 2ul>, dealii::VectorizedArray<double, 2ul> >::apply<0, true, false, 0, false> (shapes=0x55555a9e0040, 
    in=0x55555a9dc080, out=0x55555a9dc100)
    at /ssd/deal-git7/include/deal.II/matrix_free/tensor_product_kernels.h:1826
#3  0x0000555559039324 in dealii::internal::EvaluatorTensorProduct<(dealii::internal::EvaluatorVariant)2, 2, 2, 2, dealii::VectorizedArray<double, 2ul>, dealii::VectorizedArray<double, 2ul> >::values<0, true, false> (
    this=0x7ffffffe59f0, in=0x55555a9dc080, out=0x55555a9dc100)
    at /ssd/deal-git7/include/deal.II/matrix_free/tensor_product_kernels.h:1671
#4  0x000055555901c6ba in dealii::internal::FEEvaluationImplBasisChange<(dealii::internal::EvaluatorVariant)2, (dealii::internal::EvaluatorQuantity)0, 2, 2, 2, dealii::VectorizedArray<double, 2ul>, dealii::VectorizedArray<double, 2ul> >::do_forward (n_components=1, transformation_matrix=..., values_in=0x55555a9dc080, values_out=0x55555a9dc100, 
    basis_size_1_variable=4294967295, basis_size_2_variable=4294967295)
    at /ssd/deal-git7/include/deal.II/matrix_free/evaluation_kernels.h:1546
#5  0x000055555900b8a9 in dealii::internal::FEEvaluationImplTransformToCollocation<2, 1, 2, dealii::VectorizedArray<double, 2ul> >::evaluate (n_components=2, evaluation_flag=dealii::EvaluationFlags::gradients, 
    values_dofs=0x55555a9dc040, fe_eval=...)
    at /ssd/deal-git7/include/deal.II/matrix_free/evaluation_kernels.h:2158
#6  0x0000555559003894 in dealii::internal::FEEvaluationImplEvaluateSelector<2, dealii::VectorizedArray<double, 2ul> >::run<1, 2> (n_components=2, evaluation_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555a9dc040, 
    fe_eval=...) at /ssd/deal-git7/include/deal.II/matrix_free/evaluation_kernels.h:2301
#7  0x0000555558f4c89d in dealii::SelectEvaluator<2, 1, 2, dealii::VectorizedArray<double, 2ul> >::evaluate (
    n_components=2, evaluation_flag=dealii::EvaluationFlags::gradients, values_dofs=0x55555a9dc040, eval=...)
    at /ssd/deal-git7/include/deal.II/matrix_free/evaluation_selector.h:81
#8  0x0000555558f40056 in dealii::FEEvaluation<2, 1, 2, 2, double, dealii::VectorizedArray<double, 2ul> >::evaluate
    (this=0x7ffffffeb660, values_array=0x55555a9dc040, evaluation_flag=dealii::EvaluationFlags::gradients)
    at /ssd/deal-git7/include/deal.II/matrix_free/fe_evaluation.h:7853
#9  0x0000555558f33c0e in dealii::FEEvaluation<2, 1, 2, 2, double, dealii::VectorizedArray<double, 2ul> >::evaluate
    (this=0x7ffffffeb660, evaluation_flags=dealii::EvaluationFlags::gradients)
    at /ssd/deal-git7/include/deal.II/matrix_free/fe_evaluation.h:7786
#10 0x0000555558f4cf3c in dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::do_operation_on_cell (
    this=0x55555ab3c2f0, phi=..., cell=0) at /ssd/deal-git7/include/deal.II/matrix_free/operators.h:2274
#11 0x0000555558f410d7 in dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::local_diagonal_cell (
    this=0x55555ab3c2f0, data=warning: RTTI symbol not found for class 'dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >'
..., dst=..., cell_range={...})
    at /ssd/deal-git7/include/deal.II/matrix_free/operators.h:2388
#12 0x0000555559074d68 in dealii::internal::MFWorker<dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >, true>::process_range (this=0x7ffffffebca0, fu=
    @0x7ffffffebcb8: (void (dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2> >::*)(const dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2> > * const, const dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2> > &, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> &, const dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> &, const std::pair<unsigned int, unsigned int> &)) 0x555558f40f3a <dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::local_diagonal_cell(dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> > const&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&, std::pair<unsigned int, unsigned int> const&) const>, ptr=std::vector of length 3, capacity 4 = {...}, 
--Type <RET> for more, q to quit, c to continue without paging--c
    data=std::vector of length 2, capacity 6 = {...}, range_index=0) at /ssd/deal-git7/include/deal.II/matrix_free/matrix_free.h:4481
#13 0x0000555559070587 in dealii::internal::MFWorker<dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >, true>::cell (this=0x7ffffffebca0, range_index=0) at /ssd/deal-git7/include/deal.II/matrix_free/matrix_free.h:4443
#14 0x00007ffff0407455 in dealii::internal::MatrixFreeFunctions::TaskInfo::loop (this=0x55555acb4c48, funct=warning: RTTI symbol not found for class 'dealii::internal::MFWorker<dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >, true>'
...) at ../source/matrix_free/task_info.cc:623
#15 0x0000555558f4141e in dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> >::cell_loop<dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> > (this=0x55555acb49b0, function_pointer=(void (dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2> >::*)(const dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2> > * const, const dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2> > &, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> &, const dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> &, const std::pair<unsigned int, unsigned int> &)) 0x555558f40f3a <dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::local_diagonal_cell(dealii::MatrixFree<2, double, dealii::VectorizedArray<double, 2ul> > const&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>&, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host> const&, std::pair<unsigned int, unsigned int> const&) const>, owning_class=0x55555ab3c2f0, dst=..., src=..., zero_dst_vector=false) at /ssd/deal-git7/include/deal.II/matrix_free/matrix_free.h:4855
#16 0x0000555558f34962 in dealii::MatrixFreeOperators::LaplaceOperator<2, 1, 2, 2, dealii::LinearAlgebra::distributed::Vector<double, dealii::MemorySpace::Host>, dealii::VectorizedArray<double, 2ul> >::compute_diagonal (this=0x55555ab3c2f0) at /ssd/deal-git7/include/deal.II/matrix_free/operators.h:2195
#17 0x0000555558ef6fdd in aspect::MeshDeformation::MeshDeformationHandler<2>::compute_mesh_displacements_gmg (this=0x55555a963c40) at /ssd/aspect-git-clean/source/mesh_deformation/interface.cc:1134
#18 0x0000555558edbfcd in aspect::MeshDeformation::MeshDeformationHandler<2>::setup_dofs (this=0x55555a963c40) at /ssd/aspect-git-clean/source/mesh_deformation/interface.cc:1414
#19 0x00005555579b0494 in aspect::Simulator<2>::setup_dofs (this=0x7fffffff9c70) at /ssd/aspect-git-clean/source/simulator/core.cc:1456
#20 0x00005555579af83d in aspect::Simulator<2>::run (this=0x7fffffff9c70) at /ssd/aspect-git-clean/source/simulator/core.cc:1962
#21 0x00005555592f4f66 in run_simulator<2> (raw_input_as_string="# Test GMG with mesh deformation with adaptive mesh refinement\n# MPI: 4\n#\n# This test fails with dealii::MGLevelGlobalTransfer Dimension 2 not equal to 0.\n\n\n# This test is based on\ninclude $ASPECT_SOU"..., input_as_string="# Test GMG with mesh deformation with adaptive mesh refinement\n# MPI: 4\n#\n# This test fails with dealii::MGLevelGlobalTransfer Dimension 2 not equal to 0.\n\n\n# This test is based on\ninclude /ssd/aspect"..., output_xml=false, output_plugin_graph=false, validate_only=false) at /ssd/aspect-git-clean/source/main.cc:598
#22 0x00005555592aba70 in main (argc=2, argv=0x7fffffffdb28) at /ssd/aspect-git-clean/source/main.cc:785

tjhei avatar Aug 05 '22 03:08 tjhei

@zjiaqi2018 Can you reproduce this on your machine as well? If yes, can you try to make a minimal deal.II test that (hopefully) shows the same issue?

tjhei avatar Aug 05 '22 03:08 tjhei

Yes, I can reproduce. Sure, I will try to make a deal.ii test.

The following tests FAILED:
        210 - crameri_benchmark_1_gmg (Failed)
        290 - free_surface_VE_cylinder_2D_loading_fixed_elastic_dt_gmg (Failed)
        293 - free_surface_blob_gmg (Failed)
        298 - free_surface_iterated_stokes_gmg (Failed)
        316 - gmg_mesh_deform (Failed)
        317 - gmg_mesh_deform_adaptive (Failed)
        318 - gmg_mesh_deform_adaptive_bug (Failed)
        319 - gmg_mesh_deform_adaptive_bug2 (Failed)
        320 - gmg_mesh_deform_function (Failed)
        321 - gmg_mesh_deform_ghost_entries (Failed)
        322 - gmg_mesh_deform_prescribed (Failed)
        323 - gmg_mesh_deform_topo (Failed)

zjiaqi2018 avatar Aug 05 '22 04:08 zjiaqi2018

This line produces nan, data of values_dofs is initial {1, 1}, but becomes {nan, nan} after adding c * n_dofs

https://github.com/dealii/dealii/blob/f8d87ffe952726ec8e78249e307441238abda94f/include/deal.II/matrix_free/evaluation_kernels.h#L2160

zjiaqi2018 avatar Aug 05 '22 06:08 zjiaqi2018

Do we know which deal.II commit caused this?

bangerth avatar Aug 05 '22 17:08 bangerth

For reference, nothing in this function has changed this year, so it must be some upstream or downstream function.

bangerth avatar Aug 05 '22 17:08 bangerth

Scroll up in this issue: I did a git bisect last time. The problem is that we did not fill invalid values with NaN until recently. This means some random stuff influences if we crash or not (allocating a large vector containing only false hid the bug before and was found with git bisect).

tjhei avatar Aug 06 '22 03:08 tjhei

At least on my machine it is fixed with https://github.com/dealii/dealii/pull/14224

tjhei avatar Aug 26 '22 20:08 tjhei

... and github actions are also happy! :fireworks:

tjhei avatar Aug 28 '22 14:08 tjhei