std-simd
                                
                                 std-simd copied to clipboard
                                
                                    std-simd copied to clipboard
                            
                            
                            
                        WIP: clang build fixes and workarounds
This is an initial list of changes needed to make it build-able by clang. The list of fixes is not full and PR is here to make a notification about early findings, which may be useful anyway.
My hello world example, which I use to make initial build possible have only 3 remaining errors:
In file included from /std-simd/experimental/simd:59:
/std-simd/experimental/bits/simd_builtin.h:1144:9: error: incomplete type 'std::experimental::parallelism_v2::_CommonImplX86' named in nested name specifier
        return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_builtin.h:953:8: note: forward declaration of 'std::experimental::parallelism_v2::_CommonImplX86'
struct _CommonImplX86;
       ^
/std-simd/experimental/bits/simd_builtin.h:1171:11: error: incomplete type 'std::experimental::parallelism_v2::_CommonImplX86' named in nested name specifier
          return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_builtin.h:953:8: note: forward declaration of 'std::experimental::parallelism_v2::_CommonImplX86'
struct _CommonImplX86;
       ^
In file included from my_test.cpp:16:
In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:710:4: error: incomplete type 'std::experimental::parallelism_v2::_MaskImplX86Mixin' named in nested name specifier
                        __make_dependent_t<_Tp, _MaskImplX86Mixin>::__to_bits(
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_x86.h:347:8: note: forward declaration of 'std::experimental::parallelism_v2::_MaskImplX86Mixin'
struct _MaskImplX86Mixin;
       ^
/std-simd/experimental/bits/simd_x86.h:4250:3: warning: 'memcpy' will always overflow; destination buffer has size 16, but size argument is 32 [-Wfortify-source]
                __builtin_memcpy(&__a, static_cast<const char*>(__mem) + 16, 32);
                ^
1 warning and 3 errors generated.
Are there ideas on how to address that?
My hello world example, which I use to make initial build possible have only 3 remaining errors:
__make_dependent_t was a clever hack to make the second template argument a dependent type in this context. Maybe I should find a better fix. But that probably requires a lot of code motion and less architecture abstraction into different files. So I'd like to avoid that. Let me find out whether clang is actually correct to reject this code.
https://godbolt.org/z/NeTaNs apparently only Clang rejects this pattern. That doesn't prove Clang is wrong. Not sure where to start in the standard :wink:
Try this:
@@ -405,7 +405,12 @@ __is_neon_abi()                                             
⸱                                                                                
 // }}}                                                                          
 // __make_dependent_t {{{                                                       
-template <typename, typename _Up> using __make_dependent_t = _Up;               
+template <typename, typename _Up> struct __make_dependent                       
+{                                                                               
+  using type = _Up;                                                             
+};                                                                              
+template <typename _Tp, typename _Up>                                           
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;           
⸱                                                                                
 // }}}                                                                          
 // ^^^ ---- type traits ---- ^^^
Try this:
@@ -405,7 +405,12 @@ __is_neon_abi() ⸱ // }}} // __make_dependent_t {{{ -template <typename, typename _Up> using __make_dependent_t = _Up; +template <typename, typename _Up> struct __make_dependent +{ + using type = _Up; +}; +template <typename _Tp, typename _Up> +using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type; ⸱ // }}} // ^^^ ---- type traits ---- ^^^
That made this thing compile:
#include <experimental/simd>
int main()
{
  std::experimental::fixed_size_simd<int, 5> a;
  std::experimental::fixed_size_simd<float, 32> b;
  return 0;
}
Basic arithmetic ops look working.
Next broken thing is here:
// _S_signmask, _S_absmask{{{
template <typename _V, typename = _VectorTraits<_V>>
static inline constexpr _V _S_signmask = __xor(_V() + 1, _V() - 1);
clang does not consider it a constant expression https://godbolt.org/z/RpWrqw
clang does not consider it a constant expression https://godbolt.org/z/RpWrqw
That's an unfortunate restriction in Clang. The better solution would be if Clang were to support constant expressions involving [[gnu::vector_size(N)]] objects. (Note that I want to propose constexpr simd for the C++23 inclusion [1].) Until that happens we should define a macro to turn this kind of constexpr into a const.
[1] https://github.com/mattkretz/std-simd-feedback/issues/14
stdx::abs is now working for integers, but is returning zeros for FP for unknown reason.
stdx::fabs works as expected with FP.
stdx::sin has compilation issues
stdx::absis now working for integers, but is returning zeros for FP for unknown reason.
Interesting. Maybe you can check whether the _S_absmask constants are correct?
stdx::fabsworks as expected with FP.
This is, IIRC, still just a loop over std::fabs.
_S_allbits and _S_signmask are fine, but _S_absmask is not
__andnot(_S_signmask<_V>, _S_allbits<_V>); // does not work, I have no idea why
auto a =_S_signmask<_V>;
auto b = _S_allbits<_V>;
__andnot(a, b); // does work
Last unresolved error for stdx::sin() to compile:
/std-simd/experimental/bits/simd_builtin.h:1663:12: error: call to '__and' is ambiguous
    return __and(__x._M_data, __y._M_data);
           ^~~~~
/std-simd/experimental/bits/simd.h:4569:14: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplBuiltin<std::experimental::parallelism_v2::simd_abi::_VecBuiltin<8> >::__bit_and<int, 2>' requested here
      _Impl::__bit_and(__data(__x), __data(__y)));
             ^
/std-simd/experimental/bits/simd_math.h:576:48: note: in instantiation of member function 'std::experimental::parallelism_v2::operator&' requested here
      const auto __need_sin = (__f._M_quadrant & 1) == 0;
                                               ^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
  _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
                                    ^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<4>::__sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
    return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
                                             ^
/std-simd/experimental/bits/simd_math.h:564:6: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_Fixed<4> >' requested here
            sin(static_simd_cast<rebind_simd_t<double, _V>>(__x)));
            ^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
  _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
                                    ^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<32>::__sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
    return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
                                             ^
my_test.cpp:69:39: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_Fixed<32> >' requested here
  print_simd("stdx::sin( fa )", stdx::sin( fa ) );
                                      ^
/std-simd/experimental/bits/simd.h:1615:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(2 * sizeof(int)))) int, void>, _Dummy = <>]
__and(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
^
/std-simd/experimental/bits/simd.h:1626:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, $1 = __attribute__((__vector_size__(2 * sizeof(int)))) int]
__and(_Tp __a, _Tp __b) noexcept
^
_S_allbitsand_S_signmaskare fine, but_S_absmaskis not
This smells like a compiler bug: https://godbolt.org/z/SYU0Fp
Last unresolved error for stdx::sin() to compile:
Interesting. Clang and GCC disagree on how overload resolution works here.
Ok, my stdx::sin example works, and provides reasonable numbers.
Got a number of similar warnings though:
In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:2452:25: warning: '__builtin_is_constant_evaluated' will always evaluate to 'true' in a manifestly constant-evaluated expresnstant-evaluated]
    else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
                        ^
Tried to compile tests. Running via regular make quickly breaks with errors, thus I've applied direct approach to see wider picture for time-being.
for testfile in *.cpp; do
    $CXX -std=c++17 -Ivirtest -D__remove_cvref_t=std::remove_cvref_t \
        -Wno-everything \
        -D_GLIBCXX_SIMD_ABI=__sse $testfile
done
Those tests compiled:
- abi.cpp
- abi_fixed_size.cpp
- bitset_conversions.cpp
- mask.cpp
- mask_conversions.cpp
- proposed.cpp
- sfinae.cpp
- simd.cpp
- specialmath.cpp
- uninit.cpp
Those tests require __make_array workaround:
- where.cpp
Internal compiler error
- casts.cpp - internal compiler error
- integer_operators.cpp - internal compiler error
- loadstore.cpp - internal compiler error
- math.cpp - internal compiler error
Other reasons:
- experimental.cpp - no member named 'apply' in 'where_expression' (tests compilation issue?)
- operators.cpp - some issues with __vector_bitcast<_Tp>(!__x._M_data);
2020.05.15: Updated based on __andnot fix
2020.05.19: Updated based on __make_array workaround and long double issue resolved
BTW, just so you're aware: I'm currently working on integrating this repo into libstdc++ and the relevant repo would be mattkretz/gcc with the mkretz/simd2 branch. However, I regularly force-push into that branch, so don't switch to working there. But at some point I'll make a patch drop back into this repo. It's fine if we continue working like this. I just want to let you know that I probably won't be able to merge this PR as is, since I'll obviously try to get as many bug fixes as possible into the GCC branch ASAP.
Then I'll continue trying to build it here. But I need some insightful help.
Here is an error I get from tests
./../experimental/bits/simd_builtin.h:2724:14: error: no matching function for call to '__andnot'
      return __andnot(__x._M_data, _Abi::template __implicit_mask<_Tp>());
             ^~~~~~~~
...
./../experimental/bits/simd.h:1743:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('__attribute__((__vector_size__(4 * sizeof(int)))) int' (vector of 4 'int' values) vs. 'std::experimental::parallelism_v2::_SimdWrapper<int, 3, void>')
__andnot(_Tp __a, _Tp __b) noexcept
^
./../experimental/bits/simd.h:1717:1: note: candidate template ignored: requirement '!integral_constant<bool, true>::value' was not satisfied [with _Tp = __attribute__((__vector_size__(4 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(4 * sizeof(int)))) int, void>]
__andnot(_Tp __a, typename _TVT::type __b) noexcept
^
I have reproduced it with the following example:
  using fixed_i32_3 = stdx::simd<int, stdx::simd_abi::_VecBuiltin<12>>;
  fixed_i32_3 i3 = 1;
  where(!(i3 > 0), i3) = 2;
Update: resolved.
I am observing strange behavior with long double so far.
Here ABI is deduced as scalar:
void foo(stdx::simd_mask<long double> a);
But that thing is deduced to _VecBuiltin(16) and it fails :
void foo() {
    stdx::simd_mask<long double>();
}
Looks like that:
my_test_2.cpp:19:10: error: call to implicitly-deleted default constructor of 'stdx::simd_mask<long double>'
    stdx::simd_mask<long double>();
         ^
/std-simd/experimental/bits/simd.h:4061:3: note: explicitly defaulted function was implicitly deleted here
  simd_mask() = default;
  ^
/std-simd/experimental/bits/simd.h:4038:19: note: default constructor of 'simd_mask<long double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>>' is implicitly deleted because base class '_SimdTraits<long double, _VecBuiltin<16>>::_MaskBase' (aka 'std::experimental::parallelism_v2::_UnsupportedBase') has a deleted default constructor
class simd_mask : public _SimdTraits<_Tp, _Abi>::_MaskBase
                  ^
/std-simd/experimental/bits/simd.h:546:3: note: '_UnsupportedBase' has been explicitly marked deleted here
  _UnsupportedBase() = delete;
  ^
1 error generated.
I am observing strange behavior with
long doubleso far.
The error you see from constructing an object of type simd<long double, _VecBuiltin<16>> is expected. That's the result from using an unsupported ABI tag. The question is really why simd_abi::compatible<long double> would not be scalar.
https://github.com/VcDevel/std-simd/blob/master/experimental/bits/simd.h#L2304-L2307 would choose _VecBuiltin<16> if sizeof(long double) <= 8. That'd be strange for Linux.
At the moment I see a weirdest behavior. In the same compilation unit sizeof(long double) is some times evaluated to 8 and some times to 16, and this mess destroys the logic for long double. Checking with Erich, if it is a known compiler bug.
Ok, long double issue turned to be a problem of my custom compiler build. Resolved that on my side.
@mattkretz, do you happen to think of __make_array workaround?
Next thing I have, is absence of unary not operator in clang. https://godbolt.org/z/r6eeZr
This fires in operators test. Goes down to: simd_builtin.h
  template <typename _Tp, size_t _Np>
  _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
  __negate(_SimdWrapper<_Tp, _Np> __x) noexcept
  {
    return __vector_bitcast<_Tp>(!__x._M_data);
  }
I feel like I've fixed/workarounded all known compilation problems up to Haswell.
Skylake have some missing builtins, e.g. __builtin_ia32_blendmb_512_mask
Now I need to run tests.
Direct compilation of tests does not work well - compiler fails on instantiation of huge test template functions: virtest/vir/test.h:1000:12: instantiating function definition {and here is a function declarartion 200KB long}
Is there a way to run the test system only for one instruction set, e.g. SSE and/or one given data type e.g. double?
Sorry, I dropped out for a few days because of other work + a public holiday here. First thing I'll do is to get my gcc branch merged back here for convenience. That'll be a patch drop, but that should help do duplicate less work.