highway
highway copied to clipboard
Optimized `Lt128` operator for RVV
This pull request adds an optimized implementation of the Lt128
operator for RVV targets. The new implementation is synthesized using a program synthesizer.
The main computations use LMUL 1/8, which is usually more efficient than vector groups (LMUL > 1) and can outperform full vector registers (LMUL = 1) on some microarchitectures.
The compilation result: https://lt128.godbolt.org/z/xEK6v4f6f.