ompi
ompi copied to clipboard
add RVV support for MPI_OP
This patch implements MPI_OP using RISC-V Vector (RVV) extensions.
Performance results:
-
On older compilers (without RVV auto-vectorization):
- RVV-optimized ops are 1.5-4x faster than C implementations.
-
With GCC 14+ (RVV auto-vectorization enabled):
- 2-buff RVV ops perform similarly to auto-vectorized C code.
- 3-buff RVV ops are still 1.5-4x faster.