arbor icon indicating copy to clipboard operation
arbor copied to clipboard

Arbor SIMD Library Refactoring

Open anstaf opened this issue 4 years ago • 5 comments

Motivation

Eventually factor out Arbor SIMD into a separate project and make it useful for the users outside of Arbor.

Current state

Arbor SIMD provides an API that is a variation of std::experimental::simd API. It is neither subset or superset of experimental.

Distinctive features of Arbor SIMD:

  • gather/scatter support.
  • SVE backend.

Observations

  • According to N4808, std::experimental::simd provides explicit conversions from/to underlined type:
        explicit operator implementation-defined () const;
        explicit simd(const implementation-defined &);   
    
  • SVE doesn't fit std::experimetal::simd backend model fundamentally

Proposal

Let us split Arbor SIMD into two libraries:

  • arbor-simd-indirect. It will depend on std::experimental::simd and will provide gather/scatter API in the form of free functions that accept std::experimental::simd parameters. Based on the compilation target, scalar type and width it will dispatch to the proper intrinsic, using static_cast's to do simd wrapping/unwrapping. The library will not support SVE.

  • arbor-simd-sve. It will depend on std::experimental::simd and arbor-simd-indirect and will provide adapted to SVE simd API. This API will consist of makers that return vectors (like: arbsve::broadcast(42) or arbsve::copy_from(ptr)) and functions that accept vectors. If the compilation target has sve intrinsics implemetation will forward directly to them, otherwise it will fall back to std::experimental::simd + arbor-simd-indirect.

anstaf avatar Nov 19 '21 13:11 anstaf

Thanks for the proposal @antonf.

I feel that refactoring to use std::experimental::simd is impractical while not part of the standard.

  • it is available only in gcc 11, while the minimum version required by Arbor is GCC 8 (and Clang)
  • we need to understand the performance tradeoffs, and check support for features AVX512 in std::experimental implementation. For this we would have to conduct performance benchmarks.

As a rule in Arbor, we have implemented future standard library features internally, and used them when they can be replaced by mature implementations in our minimum compiler versions. Given this, I think it is too early to refactor the SIMD library to be based around std::experimental.

bcumming avatar Nov 29 '21 13:11 bcumming

I certainly like the idea of splitting out the SVE side; it's really incompatible with the rest of the API.

Regarding std::experimental::simd:

  • We could still factor our SIMD library into something that accords to the std::experimental::simd interface, and an additional component that supports the gather/scatter/constraint semantics, with a view to swapping over to the standard implementation in the future.
  • N4808, §9.7.7 provides cmath overloads for SIMD values; we can provide our own implementations with consistent numerics across back-ends under e.g. arb::math, both for SIMD and scalar values. Their optimized implementations though use low-level intrinsics rather than just the arithmetic operations provided by std::experimental::simd.

For our implementations of e.g. expm1, exprelr etc. which rely upon decomposition of the mantissa and exponent and such, we could implement a set of architecture-specific low-level operations which are then used within our generic implementations, or stick to writing things in terms of standard decomposition functions and arithmetic. The former would allow us to maintain (mostly) the performance; the latter could well be slower, but might allow us an implementation that is more easily robust (proper support for subnormal numbers, etc.).

halfflat avatar Dec 14 '21 02:12 halfflat

Hi, just happened across this issue while searching. Have you seen https://github.com/google/highway ? It's a C++ wrapper over intrinsics that supports SVE, RISC-V, AVX-512 and others. Would be happy to discuss if you're interested.

jan-wassenberg avatar Apr 05 '22 08:04 jan-wassenberg

Hi @jan-wassenberg,

thanks for the suggestion. Highway looks pretty interesting, but it's unlikely we'll change our SIMD backend soon without pressing need. (RISC-V might pose such a need in the future) Just out of curiosity, how does highway compare to VC2 (https://github.com/vectorclass/version2)?

Just to note our requirements (mostly in terms of performant operations, since this is the motivator) not only to highway, but any other choice as well

  • scatter store/gather load
  • fast approximate mathematical functions: exp, pow, sqrt, log
  • to a lesser degree: sin, cos, ...

thorstenhater avatar Nov 03 '22 07:11 thorstenhater

Hi @thorstenhater , got it. Yes, RISC-V looks to be gathering momentum.

how does highway compare to VC2

I very much respect Agner's work but he is clear that no instruction sets other than x86 will be supported.

Just to note our requirements

Good to know. We have all of those except pow, and can help add that or other math functions if required. (For pow it really depends how much accuracy you want. A simple version can use log+exp already.)

jan-wassenberg avatar Nov 03 '22 09:11 jan-wassenberg