fst icon indicating copy to clipboard operation
fst copied to clipboard

Use vectorclass library for an interface to the highest available SIMD instruction set

Open MarcusKlik opened this issue 8 years ago • 3 comments

The VCL vector class library is a tool that allows for much faster C++ code by handling multiple data in parallel using SIMD instructions. The highest available SIMD instruction set is automatically selected (at runtime?) without rewriting code. We need to check which compiler flags are required to use the SIMD instruction sets (Makevars).

MarcusKlik avatar Jul 03 '17 19:07 MarcusKlik

More information can be found on Agner Fog's pages. We need specific test methods to measure single column compression performance using the vectorclass library. Possible optimizations include:

  • faster bit-shifters for integer, Int64 and logical types
  • rounding from double to Int64 and converting back (POSIXct class)
  • calculation of max and min of an integer factor, e.q. for rescaling integer data blocks to zero based values before compression (might lead to much higher compression ratio's)

MarcusKlik avatar Jul 03 '17 19:07 MarcusKlik

Compare the vectorclass performance to the Blaze library performance. To make SIMD work for fst, multiple versions of the SIMD libraries have to be compiled for different SIMD instruction sets (e.q. AVX2, SSE2) and a runtime dispatcher is needed to select the appropriate code for the users CPU. Agner Fog's Optimizing software in C++ contains a few pages on creating such a dispatcher.

Chances are that also parts of fst's internal bit-shifters and compression filters will be compiled more efficiently when SIMD compiler options are enabled, even when not specifically designed for SIMD use (compiler will optimize using SIMD). We can test that by using the same dispatcher strategy for compiling fst's algorithms and compare performance.

MarcusKlik avatar Jul 16 '17 20:07 MarcusKlik

When a dispatcher is operational, the TurboPFor library for integer compression can be a nice (avx2) enhancement for integer column compression (> 8 GByte/s decompression speed is claimed on the landing page).

Also, simdcomp seems extremely fast (uses SSE4.1)

MarcusKlik avatar Jul 16 '17 21:07 MarcusKlik