BOUT-dev icon indicating copy to clipboard operation
BOUT-dev copied to clipboard

next builds much slower then master

Open dschwoerer opened this issue 4 years ago • 7 comments

~next~ master (EDIT: PH) takes 12 to 22 minutes to build e.g. 1 2 while next takes 21 to 33 minutes 3 4

This coincides with a significantly increased size of libbout++.a from ~400 MB to ~800 MB. This results in a full build taking roughly 14 GB of disk space, with test-compile-examples that roughly doubles again.

Git bisect blames fe3eb2be0e9c0b7eef4615d273e889ce702dbaaf which increased the size from 537821428 to 805149572 bytes.

Is this an issue?

800 MB seems huge, that is a size similar to a Linux distro live installer, and the size seems to be still increasing ...

dschwoerer avatar Mar 20 '20 15:03 dschwoerer

I'd noticed the build times, but not the library increase. That's pretty big! Even 400MB seems like a lot.

I suspect I messed up forwarding the parameter pack. Suspect it should look like:

  template <class S, class... Args>
  int push(S&& format, Args&&... args) {
    return push(fmt::format(std::foward<S>(format), std::forward<Args>(args)...));
  }

Thanks for the bisect, that's very helpful!

ZedThree avatar Mar 20 '20 15:03 ZedThree

Also I include fmt/format.h everywhere, when fmt/core.h should be sufficient almost everywhere. I've pushed a branch to see if this helps on Travis.

But neither changing the header nor forwarding the arguments makes any measurable difference on my machine :/

ZedThree avatar Mar 20 '20 17:03 ZedThree

Doesn't look like that made it any faster :disappointed: https://travis-ci.org/github/boutproject/BOUT-dev/builds/664940917

ZedThree avatar Mar 20 '20 17:03 ZedThree

Interesting article on compile times: https://cor3ntin.github.io/posts/compiletime/

I thought I had made an attempt at using ccache on Travis, but I can't find it now. It might be a branch on my work desktop that I currently don't have access to.

I'll also try getting gold working again, and maybe switch to ninja for the CMake build.

ZedThree avatar Mar 26 '20 10:03 ZedThree

Doesn't look like that made it any faster disappointed https://travis-ci.org/github/boutproject/BOUT-dev/builds/664940917

You missed on reference ;-) Now it helped a lot.

I have looked at https://github.com/aras-p/ClangBuildAnalyzer and -ftime-trace and did some additional changes to reduce the compile time. However, the can lead to undefined references at the linking stage, so maybe need some more work.

This is from 7305701bed7ac28436dc8607c1a0ac08775fb512:

**** Time summary:
Compilation (213 times):
  Parsing (frontend):         1028.2 s
  Codegen & opts (backend):   1009.3 s

**** Files that took longest to parse (compiler frontend):
 23487 ms: ../../bout-dev/cython//src/mesh/index_derivs.o
  9887 ms: ../../bout-dev/cython//tests/unit/include/bout/test_region.o
  8766 ms: ../../bout-dev/cython//tests/unit/solver/test_solver.o
  8179 ms: ../../bout-dev/cython//tests/unit/src/test_bout++.o
  8175 ms: ../../bout-dev/cython//tests/unit/field/test_field3d.o
  8045 ms: ../../bout-dev/cython//tests/unit/field/test_fieldgroup.o
  7840 ms: ../../bout-dev/cython//tests/unit/include/bout/test_petsc_indexer.o
  7840 ms: ../../bout-dev/cython//tests/unit/field/test_field_factory.o
  7792 ms: ../../bout-dev/cython//tests/unit/include/test_derivs.o
  7709 ms: ../../bout-dev/cython//tests/unit/field/test_fieldperp.o

**** Files that took longest to codegen (compiler backend):
 66353 ms: ../../bout-dev/cython//tests/unit/include/bout/test_region.o
 46222 ms: ../../bout-dev/cython//tests/unit/field/test_field_factory.o
 45434 ms: ../../bout-dev/cython//src/mesh/index_derivs.o
 32789 ms: ../../bout-dev/cython//tests/unit/field/test_field3d.o
 29824 ms: ../../bout-dev/cython//tests/unit/field/test_fieldperp.o
 26045 ms: ../../bout-dev/cython//src/fmt/format.o
 25957 ms: ../../bout-dev/cython//tests/unit/sys/test_options.o
 24754 ms: ../../bout-dev/cython//tests/unit/include/bout/test_stencil.o
 20979 ms: ../../bout-dev/cython//tests/unit/field/test_field2d.o
 20638 ms: ../../bout-dev/cython//tests/unit/solver/test_solver.o

**** Templates that took longest to instantiate:
 16410 ms: mpark::detail::move_constructor<mpark::detail::traits<bool, int, dou... (166 times, avg 98 ms)
 16131 ms: mpark::detail::constructor<mpark::detail::traits<bool, int, double, ... (166 times, avg 97 ms)
 15601 ms: mpark::detail::destructor<mpark::detail::traits<bool, int, double, s... (166 times, avg 93 ms)
 15576 ms: mpark::detail::destructor<mpark::detail::traits<bool, int, double, s... (166 times, avg 93 ms)
 15316 ms: mpark::detail::visitation::alt::visit_alt<mpark::detail::dtor, mpark... (166 times, avg 92 ms)
 14536 ms: mpark::detail::visitation::base::dispatcher<true, void>::dispatch<0,... (166 times, avg 87 ms)
 11213 ms: std::map<std::__cxx11::basic_string<char>, Options::AttributeType, s... (167 times, avg 67 ms)
  9737 ms: std::map<std::__cxx11::basic_string<char>, double, std::less<std::__... (171 times, avg 56 ms)
  9623 ms: mpark::detail::visitation::alt::visit_alt_at<(lambda at /home/dave/s... (166 times, avg 57 ms)
  9271 ms: std::_Rb_tree<std::__cxx11::basic_string<char>, std::pair<const std:... (171 times, avg 54 ms)
  8907 ms: mpark::variant<bool, int, double, std::__cxx11::basic_string<char>, ... (166 times, avg 53 ms)
  8906 ms: std::_Rb_tree<std::__cxx11::basic_string<char>, std::pair<const std:... (171 times, avg 52 ms)
  8349 ms: mpark::detail::visitation::base::dispatcher<true, void>::dispatch_at... (166 times, avg 50 ms)
  7797 ms: mpark::detail::impl<bool, int, double, std::__cxx11::basic_string<ch... (166 times, avg 46 ms)
  7048 ms: std::_Rb_tree<std::__cxx11::basic_string<char>, std::pair<const std:... (171 times, avg 41 ms)
  6769 ms: std::_Rb_tree<std::__cxx11::basic_string<char>, std::pair<const std:... (166 times, avg 40 ms)
  6295 ms: mpark::detail::traits<bool, int, double, std::__cxx11::basic_string<... (166 times, avg 37 ms)
  5857 ms: std::map<CELL_LOC, std::shared_ptr<Coordinates>, std::less<CELL_LOC>... (152 times, avg 38 ms)
  5765 ms: Region<SpecificInd<IND_TYPE::IND_2D> >::Region (43 times, avg 134 ms)
  5751 ms: mpark::variant<bool, int, double, std::__cxx11::basic_string<char> > (166 times, avg 34 ms)
  5631 ms: std::_Rb_tree<CELL_LOC, std::pair<const CELL_LOC, std::shared_ptr<Co... (152 times, avg 37 ms)
  5559 ms: mpark::detail::destructor<mpark::detail::traits<bool, int, double, s... (166 times, avg 33 ms)
  5477 ms: mpark::detail::visitation::alt::visit_alt<mpark::detail::dtor, mpark... (166 times, avg 32 ms)
  5324 ms: std::map<std::__cxx11::basic_string<char>, double, std::less<std::__... (172 times, avg 30 ms)
  5280 ms: mpark::detail::visitation::base::dispatcher<true, void>::dispatch<0,... (166 times, avg 31 ms)
  5220 ms: MsgStack::push<char [28], std::__cxx11::basic_string<char>, int, con... (183 times, avg 28 ms)
  5005 ms: mpark::detail::impl<bool, int, double, std::__cxx11::basic_string<ch... (166 times, avg 30 ms)
  4950 ms: fmt::v6::format<char [28], const std::__cxx11::basic_string<char> &,... (183 times, avg 27 ms)
  4546 ms: std::_Rb_tree<std::__cxx11::basic_string<char>, std::pair<const std:... (166 times, avg 27 ms)
  4482 ms: multioutbuf<char, std::char_traits<char> >::add (175 times, avg 25 ms)

**** Template sets that took longest to instantiate:
 59421 ms: fmt::v6::format<$> (5862 times, avg 10 ms)
 52643 ms: BoutException::BoutException<$> (5051 times, avg 10 ms)
 46077 ms: std::__and_<$> (28802 times, avg 1 ms)
 28038 ms: fmt::v6::format_arg_store<$> (3810 times, avg 7 ms)
 27843 ms: std::unique_ptr<$> (1204 times, avg 23 ms)
 24801 ms: std::__uniq_ptr_impl<$> (1204 times, avg 20 ms)
 23707 ms: fmt::v6::internal::encode_types<$> (3438 times, avg 6 ms)
 23296 ms: std::vector<$> (5150 times, avg 4 ms)
 22579 ms: mpark::detail::visitation::alt::visit_alt<$> (364 times, avg 62 ms)
 21413 ms: mpark::detail::visitation::base::dispatcher<$>::dispatch<$> (364 times, avg 58 ms)
 21198 ms: fmt::v6::internal::make_args_checked<$> (5450 times, avg 3 ms)
 21136 ms: mpark::detail::destructor<$>::destroy (332 times, avg 63 ms)
 20155 ms: std::__or_<$> (20850 times, avg 0 ms)
 18941 ms: std::map<$> (2336 times, avg 8 ms)
 18259 ms: produceCombinations<$>::produceCombinations<$> (34 times, avg 537 ms)
 18227 ms: processSet<$> (34 times, avg 536 ms)
 18148 ms: addItemToDeferredFunction<$> (120 times, avg 151 ms)
 17495 ms: std::tuple<$> (2041 times, avg 8 ms)
 17479 ms: std::pair<$> (3878 times, avg 4 ms)
 17185 ms: std::is_convertible<$> (16969 times, avg 1 ms)
 17116 ms: std::_Rb_tree<$>::~_Rb_tree (1505 times, avg 11 ms)
 16728 ms: std::_Rb_tree<$>::_M_erase (1520 times, avg 11 ms)
 16410 ms: mpark::detail::move_constructor<$>::move_constructor (166 times, avg 98 ms)
 16308 ms: std::vector<$>::push_back (773 times, avg 21 ms)
 16193 ms: mpark::detail::constructor<$>::generic_construct<$> (168 times, avg 96 ms)
 15604 ms: mpark::detail::destructor<$>::~destructor (168 times, avg 92 ms)
 14681 ms: mpark::variant<$> (333 times, avg 44 ms)
 14563 ms: std::vector<$>::emplace_back<$> (665 times, avg 21 ms)
 14151 ms: std::_Rb_tree<$>::_M_get_insert_hint_unique_pos (437 times, avg 32 ms)
 13601 ms: std::vector<$>::_M_realloc_insert<$> (786 times, avg 17 ms)

**** Functions that took longest to compile:
  5778 ms: _GLOBAL__sub_I_test_region.cxx (include/bout/test_region.cxx)
  4487 ms: _GLOBAL__sub_I_test_field_factory.cxx (field/test_field_factory.cxx)
  4271 ms: Coordinates::geometry(bool, bool) (coordinates.cxx)
  3885 ms: _GLOBAL__sub_I_test_field3d.cxx (field/test_field3d.cxx)
  3330 ms: _GLOBAL__sub_I_test_options.cxx (sys/test_options.cxx)
  2809 ms: _GLOBAL__sub_I_test_fieldperp.cxx (field/test_fieldperp.cxx)
  2210 ms: ShiftedMetricTest_CalcParallelSlices_Test::TestBody() (mesh/parallel/test_shiftedmetric.cxx)
  2096 ms: _GLOBAL__sub_I_test_field2d.cxx (field/test_field2d.cxx)
  1838 ms: BoutMesh::load() (boutmesh.cxx)
  1683 ms: _GLOBAL__sub_I_test_utils.cxx (sys/test_utils.cxx)
  1660 ms: main (test_multigrid_laplace.cxx)
  1340 ms: _GLOBAL__sub_I_test_vector2d.cxx (field/test_vector2d.cxx)
  1228 ms: _GLOBAL__sub_I_test_vector3d.cxx (field/test_vector3d.cxx)
  1144 ms: _GLOBAL__sub_I_test_solver.cxx (solver/test_solver.cxx)
  1120 ms: main (test_naulin_laplace.cxx)
  1106 ms: MultigridAlg::pGMRES(double*, double*, int, int) (multigrid_alg.cxx)
  1094 ms: _GLOBAL__sub_I_test_expressionparser.cxx (sys/test_expressionparser.cxx)
  1092 ms: RegionTest_regionPeriodicShift_Test::TestBody() (include/bout/test_region.cxx)
  1062 ms: _GLOBAL__sub_I_test_stencil.cxx (include/bout/test_stencil.cxx)
  1044 ms: LaplaceMultigrid::solve(FieldPerp const&, FieldPerp const&) (multigrid_laplace.cxx)
  1005 ms: _GLOBAL__sub_I_test_derivs.cxx (include/test_derivs.cxx)
   975 ms: Coordinates::Coordinates(Mesh*, Options*, CELL_LOC, Coordinates cons... (coordinates.cxx)
   935 ms: Mesh::createDefaultRegions() (mesh.cxx)
   924 ms: BoutMesh::addBoundaryRegions() (boutmesh.cxx)
   900 ms: FieldFactory::FieldFactory(Mesh*, Options*) (field_factory.cxx)
   872 ms: _GLOBAL__sub_I_test_gridfromoptions.cxx (mesh/data/test_gridfromoptions.cxx)
   865 ms: Vector2D const V_dot_Grad<Vector2D, Vector2D, Vector2D const>(Vector... (vecops.cxx)
   860 ms: SolverTest_AddMonitorCheckFrequencies_Test::TestBody() (solver/test_solver.cxx)
   816 ms: bracket(Field3D const&, Field3D const&, BRACKET_METHOD, CELL_LOC, So... (difops.cxx)
   806 ms: testing::internal::ExecDeathTest::AssumeRole() (../../externalpackages/googletest/googletest/src/gtest-all.cc)

**** Function sets that took longest to compile / optimize:
 31338 ms: testing::internal::TypeParameterizedTest<$>::Register(char const*, t... (644 times, avg 48 ms)
 16401 ms: void fmt::v6::internal::basic_writer<$>::write_padded<$>(fmt::v6::ba... (315 times, avg 52 ms)
  5797 ms: void DerivativeStore<$>::registerDerivative<$>(std::function<$>, enu... (288 times, avg 20 ms)
  5468 ms: testing::internal::TestFactoryImpl<$>::CreateTest() (1471 times, avg 3 ms)
  4674 ms: Field3D bout::derivatives::index::standardDerivative<$>(Field3D cons... (66 times, avg 70 ms)
  4052 ms: void DerivativeType<$>::upwindOrFlux<$>(Field2D const&, Field2D cons... (176 times, avg 23 ms)
  3983 ms: Region<$>::createRegionIndices(int, int, int, int, int, int, int, int) (112 times, avg 35 ms)
  3833 ms: void DerivativeType<$>::upwindOrFlux<$>(Field3D const&, Field3D cons... (176 times, avg 21 ms)
  3694 ms: testing::internal::SuiteApiResolver<$>::GetSetUpCaseOrSuite(char con... (338 times, avg 10 ms)
  3635 ms: void std::vector<$>::_M_range_insert<$>(__gnu_cxx::__normal_iterator... (108 times, avg 33 ms)
  3411 ms: testing::internal::SuiteApiResolver<$>::GetTearDownCaseOrSuite(char ... (337 times, avg 10 ms)
  3321 ms: testing::internal::TestFactoryImpl<$>::~TestFactoryImpl() (1465 times, avg 2 ms)
  3245 ms: Field2D const interp_to<$>(Field2D const&, CELL_LOC, std::__cxx11::b... (12 times, avg 270 ms)
  3202 ms: Field3D bout::derivatives::index::flowDerivative<$>(Field3D const&, ... (40 times, avg 80 ms)
  3156 ms: void __gnu_cxx::new_allocator<$>::construct<$>(Coordinates*, Mesh*&,... (39 times, avg 80 ms)
  3076 ms: Region<$>::getContiguousBlocks(int) const (111 times, avg 27 ms)
  2790 ms: DerivativeStore<$>::setDefaults() (20 times, avg 139 ms)
  2773 ms: std::_Rb_tree<$>::_M_get_insert_unique_pos(std::__cxx11::basic_strin... (113 times, avg 24 ms)
  2671 ms: Field3D const interp_to<$>(Field3D const&, CELL_LOC, std::__cxx11::b... (10 times, avg 267 ms)
  2636 ms: BoutException::BoutException(std::__cxx11::basic_string<$>) (134 times, avg 19 ms)
  2587 ms: void std::__introsort_loop<$>(__gnu_cxx::__normal_iterator<$>, __gnu... (42 times, avg 61 ms)
  2572 ms: void DerivativeType<$>::standard<$>(Field2D const&, Field2D&, std::_... (112 times, avg 22 ms)
  2477 ms: std::_Rb_tree_iterator<$> std::_Rb_tree<$>::_M_emplace_hint_unique<$... (137 times, avg 18 ms)
  2418 ms: std::_Rb_tree<$>::equal_range(std::__cxx11::basic_string<$> const&) (110 times, avg 21 ms)
  2349 ms: void DerivativeType<$>::standard<$>(Field3D const&, Field3D&, std::_... (112 times, avg 20 ms)
  2287 ms: std::_Rb_tree<$>::_M_erase(std::_Rb_tree_node<$>*) (207 times, avg 11 ms)
  1922 ms: std::pair<$> std::_Rb_tree<$>::_M_emplace_unique<$>(std::pair<$>&&) (97 times, avg 19 ms)
  1866 ms: Region<$>::Region(int, int, int, int, int, int, int, int, int) (59 times, avg 31 ms)
  1849 ms: void std::__final_insertion_sort<$>(__gnu_cxx::__normal_iterator<$>,... (40 times, avg 46 ms)
  1746 ms: Field2D bout::derivatives::index::standardDerivative<$>(Field2D cons... (34 times, avg 51 ms)

*** Expensive headers:
115947 ms: ../../include/bout/mesh.hxx (included 73 times, avg 1588 ms), included via:
  generated_fieldops.o  (3483 ms)
  dataformat.o  (3482 ms)
  boundary_region.o  (3475 ms)
  surfaceiter.o surfaceiter.hxx  (3472 ms)
  globalfield.o globalfield.hxx  (3456 ms)
  gyro_average.o  (3450 ms)
  ...

112310 ms: /usr/include/openmpi-x86_64/mpi.h (included 346 times, avg 324 ms), included via:
  fuu.o fuu.hxx bout.hxx boutcomm.hxx  (1364 ms)
  test_yupdown_weights.o bout.hxx boutcomm.hxx  (1333 ms)
  test_multigrid_laplace.o bout.hxx boutcomm.hxx  (1260 ms)
  wave.o physicsmodel.hxx bout.hxx boutcomm.hxx  (1257 ms)
  test_laplace.o bout.hxx boutcomm.hxx  (1244 ms)
  smoothing.o mesh.hxx  (1240 ms)
  ...

110329 ms: ../../externalpackages/googletest/googletest/include/gtest/gtest.h (included 116 times, avg 951 ms), included via:
  test_solver.o  (2899 ms)
  test_monitor.o  (2694 ms)
  test_initialprofiles.o  (2435 ms)
  test_traits.o  (2391 ms)
  test_bout++.o  (2250 ms)
  test_mask.o  (2242 ms)
  ...

109041 ms: ../../../include/bout.hxx (included 33 times, avg 3304 ms), included via:
  test_naulin_laplace.o  (4196 ms)
  test_griddata.o  (4086 ms)
  test_laplace.o  (3963 ms)
  test_invpar.o  (3948 ms)
  test_yupdown_weights.o  (3913 ms)
  test_fieldfactory.o  (3881 ms)
  ...

83231 ms: ../../include/bout/deriv_store.hxx (included 79 times, avg 1053 ms), included via:
  test_region.o mesh.hxx  (2461 ms)
  test_deriv_store.o  (2257 ms)
  index_derivs.o index_derivs.hxx  (2243 ms)
  boundary_standard.o mesh.hxx  (2061 ms)
  boundary_region.o mesh.hxx  (2039 ms)
  globalfield.o globalfield.hxx mesh.hxx  (2029 ms)
  ...

73175 ms: ../../include/options.hxx (included 82 times, avg 892 ms), included via:
  optionsreader.o optionsreader.hxx  (2687 ms)
  test_gridfromoptions.o  (2395 ms)
  test_region.o mesh.hxx deriv_store.hxx  (2015 ms)
  test_generic_factory.o generic_factory.hxx  (1815 ms)
  test_deriv_store.o deriv_store.hxx  (1779 ms)
  fft_fftw.o  (1706 ms)
  ...

58226 ms: ../../include/field.hxx (included 86 times, avg 677 ms), included via:
  test_where.o  (2278 ms)
  snb.o snb.hxx invert_parderiv.hxx field3d.hxx  (2263 ms)
  field.o  (2078 ms)
  type_name.o field2d.hxx  (1784 ms)
  coordinates.o coordinates.hxx paralleltransform.hxx field3d.hxx  (1496 ms)
  test_traits.o  (1481 ms)
  ...

56448 ms: test_extras.hxx (included 40 times, avg 1411 ms), included via:
  test_petsc_setters.o  (4531 ms)
  test_petsc_matrix.o  (4186 ms)
  test_petsc_vector.o  (4040 ms)
  test_stencil.o  (3965 ms)
  test_fieldgroup.o  (2770 ms)
  test_boutexception.o  (2707 ms)
  ...

31739 ms: ../../include/field2d.hxx (included 55 times, avg 577 ms), included via:
  type_name.o  (1889 ms)
  derivs.o derivs.hxx  (1129 ms)
  test_boundary_factory.o boundary_factory.hxx boundary_op.hxx  (1128 ms)
  boundary_factory.o boundary_factory.hxx boundary_op.hxx  (1106 ms)
  test_type_name.o  (1086 ms)
  vector2d.o vector2d.hxx  (1081 ms)
  ...

30000 ms: ../../include/field3d.hxx (included 85 times, avg 352 ms), included via:
  snb.o snb.hxx invert_parderiv.hxx  (2316 ms)
  coordinates.o coordinates.hxx paralleltransform.hxx  (1554 ms)
  vector3d.o vector3d.hxx  (1239 ms)
  test_coordinates.o coordinates.hxx paralleltransform.hxx  (1206 ms)
  test_fieldgroup.o test_extras.hxx  (1155 ms)
  test_boutexception.o test_extras.hxx  (1072 ms)
  ...

dschwoerer avatar Mar 29 '20 22:03 dschwoerer

You missed on reference ;-) Now it helped a lot.

Thanks, not sure why I missed those! :+1:

I have looked at https://github.com/aras-p/ClangBuildAnalyzer and -ftime-trace and did some additional changes to reduce the compile time.

You beat me to it, I had to upgrade my OS to get clang 9.0 first :smile: This was really useful, so thank you! :tada:

Breaking up index_derivs.cxx into multiple files would be useful, and is also necessary to fix some issues with the IBM XL compiler.

Pulling the templates out of options.hxx is a good idea, but I'm not keen on the split header. Seeing as Options can only be used with the types in the Options::ValueType variant, it makes sense to just move the templates into options.cxx and explicitly instantiate them for those types there. Basically what you've done with options_full.hxx, but do it all in options.cxx.

I don't think it makes as much sense to do the same for Array, because we're more likely to want to use it for other types too.

This is also more evidence that maybe Mesh does too much. At least, we should put some effort into making mesh.hxx cheaper to include, or just not include it in as many places.

I'll make a PR from fd9cbc8, as it's clear that's definitely useful.

ZedThree avatar Mar 30 '20 10:03 ZedThree

Thanks both! This clang output is really handy. I definitely agree that Mesh needs to be cut down, but also mpi.h being included in so many places is a sign that we aren't abstracting the communication as much as we could be.

bendudson avatar Mar 30 '20 10:03 bendudson