Investigate the status of SLP vectorizer
As per Super-Node SLP: Optimized Vectorization for Code Sequences Containing Operators and Their Inverse Elements the following benchmarks may have interesting code for SLP vectorization opportunities
- 433.milc
- 453.povray
- 454.calculix
We should just extract the interesting kernels and iterate on that if there are regressions compared to AArch64
The algorithm, described in the paper, is implemented only in the icx compiler (Intel llvm-based compiler). From our measurement, it does not bring significant improvement to SLP already (the paper was published almost 4 years ago, SLP evolved significantly). Non-power-of-2 vectorization may give much better results. Currently, SLP vectorizer is disabled for RISCV target in trunk clang/llvm compiler. Mostly because the cost model for SiFive target cores is not implemented yet in the TTI interface. Without it, turning on the SLP vectorizer may cause perf regressions. We can turn on SLP vectorizer for SiFive cores after enabling SiFive cost model in TTI.