libsimdpp icon indicating copy to clipboard operation
libsimdpp copied to clipboard

Dev

Open ThomasRetornaz opened this issue 6 years ago • 4 comments

First pull request around isue #107

  • Add all_of,any_of,copy,copy_n,count,count_if,equal,fill,find,find_if,find_if_not,lexicographical_compare,max,max_element,min,min_element,none_of,reduce,replace,replace_if,transform,transform_reduce "STL" like algorithm

  • Provide non regressions tests (Validated on Visual2017 and GCC) and documentation

  • Please pay attention on workaround i make around masktype. May i miss something and better approach exist

  • I think a preliminary refactoring could be to move some usefull Unary/Binary predicate in a dedicated header

Other fix and/or proposal

  • fix TestData& operator=(const TestData& other) assignment operator

  • reduce warning (of course last commit coud be dropped)

ThomasRetornaz avatar Mar 16 '18 06:03 ThomasRetornaz

Many thanks for the PR! I really like it :-)

Thanks !

For most of the algorithms I think we could rewrite them to not use non-SIMD operations in the prologue and epilogue at all. We could do a single unaligned load that overlaps with the main aligned SIMD body, do the computations and then do unaligned store that also overlaps with the main aligned SIMD body. This would be faster in most cases, as the scalar code is multiple times slower than SIMD.

May i miss something but as i spotted above, we also need prologue if data lenght is too small to fit in simd registers( eg 7 uint , in this way we could use transparently simddp::function everywhere ) For epilogue i think i understand the overlapp concept but may we could give me some hints how to achieve this. Anyway i will try on my side

ThomasRetornaz avatar Mar 25 '18 08:03 ThomasRetornaz

May i miss something but as i spotted above, we also need prologue if data lenght is too small to fit in simd registers( eg 7 uint , in this way we could use transparently simddp::function everywhere ) For epilogue i think i understand the overlapp concept but may we could give me some hints how to achieve this. Anyway i will try on my side

Yes, if the total length is less than the width of the register, then the scalar part is needed. My point was that if you have, say, a range of 15 uint16 elements to process, then it's faster to just process two overlapping 8 element pairs instead of doing 1 full wave and 7 element scalar prologue/epilogue.

p12tic avatar Mar 31 '18 09:03 p12tic

what is the purpose of having SIMDPP_NOEXECPT and changing inline to a custom macro?

Cazadorro avatar Apr 11 '18 17:04 Cazadorro

what is the purpose of having SIMDPP_NOEXECPT and changing inline to a custom macro?

  • SIMDPP_NOEXECPT : is added for portability reason. MSVC compiler below MSVC2015 doesn't support noexecpt keyword
  • Inline macro: In this library we use SIMDPP_INL, which is an alias (depending on compiler) to enforce inlining (https://github.com/p12tic/libsimdpp/blob/master/simdpp/setup_arch.h#L371)

Regards TR

ThomasRetornaz avatar Apr 17 '18 08:04 ThomasRetornaz