Consider adding hardware optimization levels
x86_64 has a bunch of possible features but these appear usually together in shipped CPUs. So for convenience consider adding something like this:
https://www.phoronix.com/scan.php?page=news_item&px=LLVM-Clang-12-Microarch-Levels
Where the levels are:
- x86-64: CMOV, CMPXCHG8B, FPU, FXSR, MMX, FXSR, SCE, SSE, SSE2
- x86-64-v2: (close to Nehalem) CMPXCHG16B, LAHF-SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3
- x86-64-v3: (close to Haswell) AVX, AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, XSAVE
- x86-64-v4: AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL
This would make picking a reasonable set of features much easier. Ideally this would also exist for ARM and other architectures and there would be a catch all annotation that would just compile to all levels that apply to the architecture. Something like #[multiversion-broad] that would just use all these reasonable levels in whatever architecture is being compiled for.
Great idea--I've seen these a few times and it definitely simplifies things. It may be convenient to provide automatic versions like this as well.
Sorry it's been a while--this has been added to master.
The new API actually allows selecting any target CPU, such as "x86_64/skylake". Currently, x86-64-v3 doesn't work because std::arch lacks detection of MOVBE. I've opened a PR to add that in https://github.com/rust-lang/stdarch/pull/1356.
Another new feature has been added that allows you to specify #[multiversion(targets = "simd)], which automatically selects from all SIMD instruction sets. On x86/x86-64 this corresponds to the microarchitecture levels.