fixedbitset icon indicating copy to clipboard operation
fixedbitset copied to clipboard

Use runtime CPU feature detection to select which SIMD instruction set to use

Open james7132 opened this issue 1 year ago • 0 comments
trafficstars

There are options like core::arch::is_x86_feature_detected which can detect which instruction sets are available. Unfortunately the checks cannot be done inside each function call due to the cost of feature detection.

One potential way around this is to do feature detection during initialization, and use a tagged pointer to store the features detected. As any SIMD-supporting platform is at least 32-bit wide, there are at least two bits at the bottom of every pointer to a backing allocation that are always zero. If the default block size is increased to 8 to 64 bytes, the number of tag bits increases. An example mapping for x86 may include:

  • 00 - Default, none detected.
  • 01 - SSE2 detected
  • 10 - SSE4.1 detected
  • 11 - AVX detected

These bits can then be zeroed out on access in a branchless way, which should have a slight impact negative performance impact to point queries (contains, insert, etc.), but allow for the most performant instructions to be used without explicitly compiling for a particular target feature set.

james7132 avatar Mar 23 '24 01:03 james7132