std-simd WIP: clang build fixes and workarounds

This is an initial list of changes needed to make it build-able by clang. The list of fixes is not full and PR is here to make a notification about early findings, which may be useful anyway.

May 07 '20 14:05 iburyl

My hello world example, which I use to make initial build possible have only 3 remaining errors:

In file included from /std-simd/experimental/simd:59:
/std-simd/experimental/bits/simd_builtin.h:1144:9: error: incomplete type 'std::experimental::parallelism_v2::_CommonImplX86' named in nested name specifier
        return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_builtin.h:953:8: note: forward declaration of 'std::experimental::parallelism_v2::_CommonImplX86'
struct _CommonImplX86;
       ^
/std-simd/experimental/bits/simd_builtin.h:1171:11: error: incomplete type 'std::experimental::parallelism_v2::_CommonImplX86' named in nested name specifier
          return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_builtin.h:953:8: note: forward declaration of 'std::experimental::parallelism_v2::_CommonImplX86'
struct _CommonImplX86;
       ^
In file included from my_test.cpp:16:
In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:710:4: error: incomplete type 'std::experimental::parallelism_v2::_MaskImplX86Mixin' named in nested name specifier
                        __make_dependent_t<_Tp, _MaskImplX86Mixin>::__to_bits(
                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/std-simd/experimental/bits/simd_x86.h:347:8: note: forward declaration of 'std::experimental::parallelism_v2::_MaskImplX86Mixin'
struct _MaskImplX86Mixin;
       ^
/std-simd/experimental/bits/simd_x86.h:4250:3: warning: 'memcpy' will always overflow; destination buffer has size 16, but size argument is 32 [-Wfortify-source]
                __builtin_memcpy(&__a, static_cast<const char*>(__mem) + 16, 32);
                ^
1 warning and 3 errors generated.

Are there ideas on how to address that?

May 07 '20 14:05 iburyl

My hello world example, which I use to make initial build possible have only 3 remaining errors:

__make_dependent_t was a clever hack to make the second template argument a dependent type in this context. Maybe I should find a better fix. But that probably requires a lot of code motion and less architecture abstraction into different files. So I'd like to avoid that. Let me find out whether clang is actually correct to reject this code.

May 07 '20 15:05 mattkretz

https://godbolt.org/z/NeTaNs apparently only Clang rejects this pattern. That doesn't prove Clang is wrong. Not sure where to start in the standard :wink:

May 07 '20 15:05 mattkretz

Try this:

@@ -405,7 +405,12 @@ __is_neon_abi()                                             
⸱                                                                                
 // }}}                                                                          
 // __make_dependent_t {{{                                                       
-template <typename, typename _Up> using __make_dependent_t = _Up;               
+template <typename, typename _Up> struct __make_dependent                       
+{                                                                               
+  using type = _Up;                                                             
+};                                                                              
+template <typename _Tp, typename _Up>                                           
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;           
⸱                                                                                
 // }}}                                                                          
 // ^^^ ---- type traits ---- ^^^

May 07 '20 15:05 mattkretz

Try this:

@@ -405,7 +405,12 @@ __is_neon_abi()                                             
⸱                                                                                
 // }}}                                                                          
 // __make_dependent_t {{{                                                       
-template <typename, typename _Up> using __make_dependent_t = _Up;               
+template <typename, typename _Up> struct __make_dependent                       
+{                                                                               
+  using type = _Up;                                                             
+};                                                                              
+template <typename _Tp, typename _Up>                                           
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;           
⸱                                                                                
 // }}}                                                                          
 // ^^^ ---- type traits ---- ^^^

That made this thing compile:

#include <experimental/simd>
int main()
{
  std::experimental::fixed_size_simd<int, 5> a;
  std::experimental::fixed_size_simd<float, 32> b;

  return 0;
}

May 07 '20 15:05 iburyl

Basic arithmetic ops look working.

Next broken thing is here:

// _S_signmask, _S_absmask{{{
template <typename _V, typename = _VectorTraits<_V>>
static inline constexpr _V _S_signmask = __xor(_V() + 1, _V() - 1);

clang does not consider it a constant expression https://godbolt.org/z/RpWrqw

May 08 '20 12:05 iburyl

clang does not consider it a constant expression https://godbolt.org/z/RpWrqw

That's an unfortunate restriction in Clang. The better solution would be if Clang were to support constant expressions involving [[gnu::vector_size(N)]] objects. (Note that I want to propose constexpr simd for the C++23 inclusion [1].) Until that happens we should define a macro to turn this kind of constexpr into a const.

[1] https://github.com/mattkretz/std-simd-feedback/issues/14

May 08 '20 12:05 mattkretz

stdx::abs is now working for integers, but is returning zeros for FP for unknown reason.

stdx::fabs works as expected with FP.

stdx::sin has compilation issues

May 08 '20 15:05 iburyl

stdx::abs is now working for integers, but is returning zeros for FP for unknown reason.

Interesting. Maybe you can check whether the _S_absmask constants are correct?

stdx::fabs works as expected with FP.

This is, IIRC, still just a loop over std::fabs.

May 08 '20 18:05 mattkretz

_S_allbits and _S_signmask are fine, but _S_absmask is not

__andnot(_S_signmask<_V>, _S_allbits<_V>); // does not work, I have no idea why
auto a =_S_signmask<_V>;
auto b = _S_allbits<_V>;
__andnot(a, b); // does work

May 08 '20 20:05 iburyl

Last unresolved error for stdx::sin() to compile:

/std-simd/experimental/bits/simd_builtin.h:1663:12: error: call to '__and' is ambiguous
    return __and(__x._M_data, __y._M_data);
           ^~~~~
/std-simd/experimental/bits/simd.h:4569:14: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplBuiltin<std::experimental::parallelism_v2::simd_abi::_VecBuiltin<8> >::__bit_and<int, 2>' requested here
      _Impl::__bit_and(__data(__x), __data(__y)));
             ^
/std-simd/experimental/bits/simd_math.h:576:48: note: in instantiation of member function 'std::experimental::parallelism_v2::operator&' requested here
      const auto __need_sin = (__f._M_quadrant & 1) == 0;
                                               ^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
  _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
                                    ^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<4>::__sin<double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
    return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
                                             ^
/std-simd/experimental/bits/simd_math.h:564:6: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<double, std::experimental::parallelism_v2::simd_abi::_Fixed<4> >' requested here
            sin(static_simd_cast<rebind_simd_t<double, _V>>(__x)));
            ^
/std-simd/experimental/bits/simd_fixed_size.h:1632:37: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
  _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
                                    ^
/std-simd/experimental/bits/simd_math.h:558:46: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::_SimdImplFixedSize<32>::__sin<float, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16> >' requested here
    return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
                                             ^
my_test.cpp:69:39: note: in instantiation of function template specialization 'std::experimental::parallelism_v2::sin<float, std::experimental::parallelism_v2::simd_abi::_Fixed<32> >' requested here
  print_simd("stdx::sin( fa )", stdx::sin( fa ) );
                                      ^
/std-simd/experimental/bits/simd.h:1615:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(2 * sizeof(int)))) int, void>, _Dummy = <>]
__and(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
^
/std-simd/experimental/bits/simd.h:1626:1: note: candidate function [with _Tp = __attribute__((__vector_size__(2 * sizeof(int)))) int, $1 = __attribute__((__vector_size__(2 * sizeof(int)))) int]
__and(_Tp __a, _Tp __b) noexcept
^

May 08 '20 20:05 iburyl

_S_allbits and _S_signmask are fine, but _S_absmask is not

This smells like a compiler bug: https://godbolt.org/z/SYU0Fp

May 08 '20 21:05 mattkretz

Last unresolved error for stdx::sin() to compile:

Interesting. Clang and GCC disagree on how overload resolution works here.

May 08 '20 21:05 mattkretz

Ok, my stdx::sin example works, and provides reasonable numbers.

Got a number of similar warnings though:

In file included from /std-simd/experimental/simd:62:
/std-simd/experimental/bits/simd_x86.h:2452:25: warning: '__builtin_is_constant_evaluated' will always evaluate to 'true' in a manifestly constant-evaluated expresnstant-evaluated]
    else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
                        ^

May 13 '20 11:05 iburyl

Tried to compile tests. Running via regular make quickly breaks with errors, thus I've applied direct approach to see wider picture for time-being.

for testfile in *.cpp; do
    $CXX -std=c++17 -Ivirtest -D__remove_cvref_t=std::remove_cvref_t \
        -Wno-everything \
        -D_GLIBCXX_SIMD_ABI=__sse $testfile
done

Those tests compiled:

abi.cpp
abi_fixed_size.cpp
bitset_conversions.cpp
mask.cpp
mask_conversions.cpp
proposed.cpp
sfinae.cpp
simd.cpp
specialmath.cpp
uninit.cpp

Those tests require __make_array workaround:

where.cpp

Internal compiler error

casts.cpp - internal compiler error
integer_operators.cpp - internal compiler error
loadstore.cpp - internal compiler error
math.cpp - internal compiler error

Other reasons:

experimental.cpp - no member named 'apply' in 'where_expression' (tests compilation issue?)
operators.cpp - some issues with __vector_bitcast<_Tp>(!__x._M_data);

2020.05.15: Updated based on __andnot fix 2020.05.19: Updated based on __make_array workaround and long double issue resolved

May 13 '20 14:05 iburyl

BTW, just so you're aware: I'm currently working on integrating this repo into libstdc++ and the relevant repo would be mattkretz/gcc with the mkretz/simd2 branch. However, I regularly force-push into that branch, so don't switch to working there. But at some point I'll make a patch drop back into this repo. It's fine if we continue working like this. I just want to let you know that I probably won't be able to merge this PR as is, since I'll obviously try to get as many bug fixes as possible into the GCC branch ASAP.

May 14 '20 06:05 mattkretz

Then I'll continue trying to build it here. But I need some insightful help.

Here is an error I get from tests

./../experimental/bits/simd_builtin.h:2724:14: error: no matching function for call to '__andnot'
      return __andnot(__x._M_data, _Abi::template __implicit_mask<_Tp>());
             ^~~~~~~~
...
./../experimental/bits/simd.h:1743:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('__attribute__((__vector_size__(4 * sizeof(int)))) int' (vector of 4 'int' values) vs. 'std::experimental::parallelism_v2::_SimdWrapper<int, 3, void>')
__andnot(_Tp __a, _Tp __b) noexcept
^
./../experimental/bits/simd.h:1717:1: note: candidate template ignored: requirement '!integral_constant<bool, true>::value' was not satisfied [with _Tp = __attribute__((__vector_size__(4 * sizeof(int)))) int, _TVT = std::experimental::parallelism_v2::_VectorTraitsImpl<__attribute__((__vector_size__(4 * sizeof(int)))) int, void>]
__andnot(_Tp __a, typename _TVT::type __b) noexcept
^

I have reproduced it with the following example:

  using fixed_i32_3 = stdx::simd<int, stdx::simd_abi::_VecBuiltin<12>>;
  fixed_i32_3 i3 = 1;
  where(!(i3 > 0), i3) = 2;

Update: resolved.

May 14 '20 11:05 iburyl

I am observing strange behavior with long double so far.

Here ABI is deduced as scalar:

void foo(stdx::simd_mask<long double> a);

But that thing is deduced to _VecBuiltin(16) and it fails :

void foo() {
    stdx::simd_mask<long double>();
}

Looks like that:

my_test_2.cpp:19:10: error: call to implicitly-deleted default constructor of 'stdx::simd_mask<long double>'
    stdx::simd_mask<long double>();
         ^
/std-simd/experimental/bits/simd.h:4061:3: note: explicitly defaulted function was implicitly deleted here
  simd_mask() = default;
  ^
/std-simd/experimental/bits/simd.h:4038:19: note: default constructor of 'simd_mask<long double, std::experimental::parallelism_v2::simd_abi::_VecBuiltin<16>>' is implicitly deleted because base class '_SimdTraits<long double, _VecBuiltin<16>>::_MaskBase' (aka 'std::experimental::parallelism_v2::_UnsupportedBase') has a deleted default constructor
class simd_mask : public _SimdTraits<_Tp, _Abi>::_MaskBase
                  ^
/std-simd/experimental/bits/simd.h:546:3: note: '_UnsupportedBase' has been explicitly marked deleted here
  _UnsupportedBase() = delete;
  ^
1 error generated.

May 15 '20 14:05 iburyl

I am observing strange behavior with long double so far.

The error you see from constructing an object of type simd<long double, _VecBuiltin<16>> is expected. That's the result from using an unsupported ABI tag. The question is really why simd_abi::compatible<long double> would not be scalar. https://github.com/VcDevel/std-simd/blob/master/experimental/bits/simd.h#L2304-L2307 would choose _VecBuiltin<16> if sizeof(long double) <= 8. That'd be strange for Linux.

May 15 '20 14:05 mattkretz

At the moment I see a weirdest behavior. In the same compilation unit sizeof(long double) is some times evaluated to 8 and some times to 16, and this mess destroys the logic for long double. Checking with Erich, if it is a known compiler bug.

May 16 '20 19:05 iburyl

Ok, long double issue turned to be a problem of my custom compiler build. Resolved that on my side.

May 18 '20 14:05 iburyl

@mattkretz, do you happen to think of __make_array workaround?

May 18 '20 15:05 iburyl

Next thing I have, is absence of unary not operator in clang. https://godbolt.org/z/r6eeZr

This fires in operators test. Goes down to: simd_builtin.h

  template <typename _Tp, size_t _Np>
  _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
  __negate(_SimdWrapper<_Tp, _Np> __x) noexcept
  {
    return __vector_bitcast<_Tp>(!__x._M_data);
  }

May 19 '20 09:05 iburyl

I feel like I've fixed/workarounded all known compilation problems up to Haswell. Skylake have some missing builtins, e.g. __builtin_ia32_blendmb_512_mask

Now I need to run tests.

Direct compilation of tests does not work well - compiler fails on instantiation of huge test template functions: virtest/vir/test.h:1000:12: instantiating function definition {and here is a function declarartion 200KB long}

Is there a way to run the test system only for one instruction set, e.g. SSE and/or one given data type e.g. double?

May 20 '20 08:05 iburyl

Sorry, I dropped out for a few days because of other work + a public holiday here. First thing I'll do is to get my gcc branch merged back here for convenience. That'll be a patch drop, but that should help do duplicate less work.

May 25 '20 08:05 mattkretz

std-simd std-simd copied to clipboard

WIP: clang build fixes and workarounds

std-simd
std-simd copied to clipboard