xsimd
xsimd copied to clipboard
Multiple constructor calls for one object initialization
Once again, as I pointed out in #775, a single xsimd::batch
object initialization ends up calling multiple constructors:
I took the liberty of putting
std::cout
statements in thebatch
constructor code, like so:
/**
* Create a batch with all element initialized to \c val.
*/
template <class T, class A>
inline batch<T, A>::batch(T val) noexcept
: types::simd_register<T, A>(kernel::broadcast<A>(val, A {}))
{
std::cout << "broadcasting constructor\n";
}
/**
* Create a batch with elements initialized from \c data.
* It is an error to have `data.size() != size.
*/
template <class T, class A>
inline batch<T, A>::batch(std::initializer_list<T> data) noexcept
: batch(data.begin(), detail::make_index_sequence<size>())
{
details::check_batch_init(*this, data);
std::cout << "intializer list constructor\n";
}
And then ran a little test program:
#include <iostream>
#include <xsimd/xsimd.hpp>
int main() {
xsimd::batch<double, xsimd::avx2> b(2.5);
std::cout << "==========================\n";
xsimd::batch<double, xsimd::avx2> c{2.5, 5, 7.5, 10};
std::cout << "==========================\n";
xsimd::batch<double, xsimd::avx2> d = {3.5, 7, 10.5, 14};
}
And this was the output:
register_type constructor
broadcasting constructor
==========================
register_type constructor
pointer-to-data constructor
intializer list constructor
==========================
register_type constructor
pointer-to-data constructor
intializer list constructor
We should fix this. Apart from the obvious gain in performance in debug builds, we should also see a small performance increase in Release builds (in my experiments, I got the same behavior even in Release mode).
Where does the "pointer-to-data constructor" output come from?
Also I guess this issue should be solved when we merge the variadic constructor implementation. Could you check if you have the same kind of issue on this branch?
Where does the "pointer-to-data constructor" output come from?
I put std::cout
statements on every single constructor. I only reproduced two of them here, as examples. If you want, I could push this code in a branch on my fork and link it here.
Also I guess this issue should be solved when we merge the variadic constructor implementation. Could you check if you have the same kind of issue on this branch?
Will do.
Also I guess this issue should be solved when we merge the variadic constructor implementation. Could you check if you have the same kind of issue on this branch?
Will do.
It seems that the branch isn't ready for testing yet. I'm getting compilation errors currently. I'll try once again once the branch is a bit more mature.
(in my experiments, I got the same behavior even in Release mode).
@wermos worth nothing that might actually be because of your std::cout
statements - I suspect that in Release builds a completely delegating constructor (i.e. one with no extra code in the function body) will be completely folded in most situations.
You are definitely correct about debug perf here, though. When I step through the ctor init chain using MSVC it steps through each call individually.
@wermos worth nothing that might actually be because of your
std::cout
statements - I suspect that in Release builds a completely delegating constructor (i.e. one with no extra code in the function body) will be completely folded in most situations.
Ah yes, you're right. In Release mode, none of the unnecessary constructor calls happened, once I removed the print statements (I checked using gdb
.)
Also I guess this issue should be solved when we merge the variadic constructor implementation. Could you check if you have the same kind of issue on this branch?
Will do.
It seems that the branch isn't ready for testing yet. I'm getting compilation errors currently. I'll try once again once the branch is a bit more mature.
I redid the process with the exact same test program, and here is the result:
register type constructor
single element constructor
==========================
register type constructor
variadic template constructor
==========================
register type constructor
variadic template constructor
It appears that this constructor:
template <class T, class A>
inline batch<T, A>::batch(register_type reg) noexcept
: types::simd_register<T, A>({ reg })
{
// std::cout << "register type constructor\n";
}
always runs before any other constructor.
Back on this one. I understand that debug builds are slower with delegating constructors, but that's the case for any delegating constructor, right? And delegating constructor avoid code duplication. Actually, the following trace extracted from above
register_type constructor pointer-to-data constructor intializer list constructor
Makes a lot of sense to me:
- access contiguous memory from intiializer_list
- load that into a register type
- store that register type as an object member.
Those are three different steps and I'm glad I can rely on a different constructor to acheive each step, and don't need to duplicate the logic for doing so.
Closing as it makes no sense to me to optimize debug build at the expense of code duplication.
it makes no sense to me to optimize debug build at the expense of code duplication.
This is true in the general case, but a simd library is, by it's very nature, performance-oriented. In my experience your evaluation is misguided - debug build performance is always worth thinking about! 😢