xtensor
xtensor copied to clipboard
Avoid implicit conversions for bitwise operators
This PR partially addresses #2707.
When using XSimd, the following code will compile, however, the resulting code is inefficient if the return type of operator|(const char&, const char&)
is int
instead of char
. [C++ allows implicitly converting/promoting char
to int
.]
(https://en.cppreference.com/w/cpp/language/implicit_conversion).
#include <xtensor/xfixed.hpp>
void xtensor_or(xt::xtensor_fixed<char, xt::xshape<16>>& b1,
const xt::xtensor_fixed<char, xt::xshape<16>>& b2) {
b1 |= b2;
}
The generated assembly code performs the operation using 32-bit integers and contains many pack and unpack instructions around it. Supplying explict return types in the functors instead of 'auto' avoids those pack and unpack instructions and has a single 'vpor' instruction (when using avx2) that or's all 16 values at once.
Although this improvement works for small integral values like char
and unsigned short
, using bool
still results in inefficient code since xt_simd::simd_condition<bool, bool>::value
is false. xsimd::detail::simd_condition
explicitly is false when both types are bool
, for unknown reasons.
I have only applied the improvement to bitwise operators, since it's probably not safe for other operators. For example, adding characters may overflow so there the implicit conversion to int
is useful. On the other side, adding int
s themselves could also overflow and they are not promoted to other types. Perhaps this optimization is possible for all operators.
Thanks. This is not my expertise, but I do already have some questions.
@tdegeus the checks on this appear to be stuck burning up compute time.