cppcheck icon indicating copy to clipboard operation
cppcheck copied to clipboard

avoid returning `SmallVector` since it has sizable overhead

Open firewave opened this issue 3 years ago • 4 comments

Scanning mame_regtest with --enable=all --inconclusive and DISABLE_VALUEFLOW=1:

Clang 14 2,082,292,106 -> 2,073,152,473 Clang 14 1,829,061,568 -> 1,766,615,436 (with Boost 1.74)

firewave avatar Sep 29 '22 22:09 firewave

This seems to make the function a lot more complicated. How much actual time improvement in terms of seconds do we get for this?

We should just be using NRVO to avoid moving/copying the vector on return.

pfultz2 avatar Sep 30 '22 01:09 pfultz2

This seems to make the function a lot more complicated.

It was already made more complicated by the fact that the braced initializer was not usable with boost::small_vector. Fortunately this is the only function we had to complicate and there's no more need at all in the code for SmallVector.

How much actual time improvement in terms of seconds do we get for this?

This depends on how complex the code is. In case of the callgrind selfcheck it reduces Ir from 85,245,183,560 to 81,162,048,956 which is much more than the previous change did (86,630,832,504 -> 85,414,145,406) - but this is on top of that. It appears this change saves about 2% of the run time in actual time which is quite good given that the Ir count dropped by 5% (they usually don't correlate that much). But that is all without valueflow. And with Boost - so by default we won't see any sizable impact - we also won't see it in daca.

We should just be using NRVO to avoid moving/copying the vector on return.

The problem is with boost::small_vector. It appears you are not supposed to copy/move/return this but only work on an object. It is not explicitly stated but the example in the documentation implies this: https://www.boost.org/doc/libs/1_80_0/doc/html/boost/container/small_vector_base.html. The Ir saved by this changes comes from eliminating the assignment/move-construction methods of it.

1,075,004,357 ( 1.26%)  /usr/include/boost/container/detail/copy_move_algo.hpp:void boost::container::copy_assign_range_alloc_n<boost::container::small_vector_allocator<ReferenceToken, boost::container::new_allocator<void>, void>, boost::move_iterator<ReferenceToken*>, ReferenceToken*>(boost::container::small_vector_allocator<ReferenceToken, boost::container::new_allocator<void>, void>&, boost::move_iterator<ReferenceToken*>, boost::container::allocator_traits<boost::container::small_vector_allocator<ReferenceToken, boost::container::new_allocator<void>, void> >::size_type, ReferenceToken*, boost::container::allocator_traits<boost::container::small_vector_allocator<ReferenceToken, boost::container::new_allocator<void>, void> >::size_type) [/home/runner/work/cppcheck/cppcheck/cppcheck]`
...
784,529,847 ( 0.92%)  /usr/include/boost/container/vector.hpp:void boost::container::vector<ReferenceToken, boost::container::small_vector_allocator<ReferenceToken, boost::container::new_allocator<void>, void>, void>::assign<boost::move_iterator<ReferenceToken*> >(boost::move_iterator<ReferenceToken*>, boost::move_iterator<ReferenceToken*>, boost::move_detail::disable_if_or<void, boost::move_detail::is_same<boost::move_detail::integral_constant<unsigned int, 1u>, boost::move_detail::integral_constant<unsigned int, 0u> >, boost::move_detail::is_convertible<boost::move_iterator<ReferenceToken*>, unsigned long>, boost::container::dtl::is_input_iterator<boost::move_iterator<ReferenceToken*>, has_iterator_category<boost::move_iterator<ReferenceToken*> >::value>, boost::move_detail::bool_<false> >::type*) [/home/runner/work/cppcheck/cppcheck/cppcheck]

firewave avatar Sep 30 '22 09:09 firewave

On a side note.

I would love to do more cycle- and time-based profiling as well. But I cannot use perf since I do not have a bare-metal Linux machine (will use my old one for that when I ever get a new one). Visual Studio does time-based but spends a lot of time in function that have no information making it strange. I don't trust llvm-mca since it's instruction data does not reflect what I get from callgrind afterward. And I haven't founda time-based profiler for Linux yet.

Time-based would be great since I have the feeling we are losing time in our thread/process spawning/waiting code. I would like to apply the simplification from LCppC but there's still too many tests lacking for -j.

firewave avatar Sep 30 '22 09:09 firewave

We should just be using NRVO to avoid moving/copying the vector on return.

Interestingly using the std::vector as object or parameter is better code than directly returning it even without Boost being involved. Might be related to the braced initializer behaving differently.

But these are things I try to look into (it's a big list of things which grows all the time) and report it upstream. Unfortunately the quality of my reports is often lacking (as is my knowledge) and it's also not that straight forward to reproduce these issues in isolated/reduced cases. There's also a lack of discussion on several of these tickets since even though LLVM (which is the project I mainly report these issues for at the moment) is a big project there's not even a handful of people involved in that particular area which are also (obviously) pressed for time. So there's a certain lack of discussion/feedback on these as well.

firewave avatar Sep 30 '22 11:09 firewave

And I haven't founda time-based profiler for Linux yet.

@firewave Would Orbit help at all? "the top-down view to analyze the call tree of your application and the relative time spent in each function." doc

Off topic, feel to mark as such. They have been recently applying clang-tidy and iwyu.

JohnsterID avatar Dec 21 '22 00:12 JohnsterID

This has been superseded by #5442.

firewave avatar Sep 12 '23 22:09 firewave