simde
simde copied to clipboard
add optimized implementations using RISC-V vector intrinsics
(I don't plan on doing this myself, but I wanted to start the conversation to see who is interested in doing this)
What
Use RISC-V vector intrinsics to provide optimized implementations of the existing intrinsics (X86, ARM Neon, MIPS MSA, WASM, etc.) already in SIMD Everywhere.
Existing work
- SSE: https://github.com/FeddrickAquino/sse2rvv (assumes
VLEN
of 128bits). - NEON: https://doi.org/10.48550/arXiv.2309.16509 (requires clang 17 ; SIMDe source code pending)
- NEON: https://github.com/howjmay/neon2rvv
When to start
The vector extensions themselves were ratified in 2021. The intrinsics for using them from C/C++ are nearly ratified (see below), therefore we can start accepting contributions now.
Hopefully we will have a ratified specification by the end of this year.
(source)
Takes 45 days as the public review period (till mid-December)
(source)
Recent draft: https://github.com/riscv-non-isa/rvv-intrinsic-doc/releases/download/draft-20231014-c10de5388709b000ecc4becb0d9ee16baa0141a9/v-intrinsic-spec.pdf (latest drafts)
https://github.com/riscv-non-isa/rvv-intrinsic-doc
Which compilers to test?
Upcoming LLVM 17 and GCC trunk supports v0.12, which is expected to be identical to the to-be-frozen intrinsic specification.
Clang 16 and GCC 13 supports the v0.11 version, which does not have tuple type segment load/store intrinsics, fixed-point intrinsics with rounding mode parameter, and floating-point intrinsics with rounding mdoe parameter.
Benchmarking
Maybe autovectorization is good enough. Hand written implementations should both be compared by the number of instructions and on real-world performance.
Please share any suggestions for publicly available RISC-V Vector 1.0 systems.
https://riscv.org/risc-v-developer-boards/details/
https://www.riscfive.com/risc-v-development-boards/ lists some boards with the V
extension, but I can't find a public declaration that any of them follow the 1.0 version of the vector extension.
According to https://doi.org/10.48550/arXiv.2210.08882 , the following cores implement v1.0 of the RISC-V Vector Extension: SiFive X280, Andes NX27V, Atrevido 220. Notably for the riscfive.com list of dev boards, the XuanTie 910 core is RVV version 0.7.1.