fst
fst copied to clipboard
Use vectorclass library for an interface to the highest available SIMD instruction set
The VCL vector class library is a tool that allows for much faster C++ code by handling multiple data in parallel using SIMD instructions. The highest available SIMD instruction set is automatically selected (at runtime?) without rewriting code. We need to check which compiler flags are required to use the SIMD instruction sets (Makevars).
More information can be found on Agner Fog's pages. We need specific test methods to measure single column compression performance using the vectorclass library. Possible optimizations include:
- faster bit-shifters for
integer,Int64andlogicaltypes - rounding from
doubletoInt64and converting back (POSIXctclass) - calculation of
maxandminof anintegerfactor, e.q. for rescalingintegerdata blocks to zero based values before compression (might lead to much higher compression ratio's)
Compare the vectorclass performance to the Blaze library performance. To make SIMD work for fst, multiple versions of the SIMD libraries have to be compiled for different SIMD instruction sets (e.q. AVX2, SSE2) and a runtime dispatcher is needed to select the appropriate code for the users CPU. Agner Fog's Optimizing software in C++ contains a few pages on creating such a dispatcher.
Chances are that also parts of fst's internal bit-shifters and compression filters will be compiled more efficiently when SIMD compiler options are enabled, even when not specifically designed for SIMD use (compiler will optimize using SIMD). We can test that by using the same dispatcher strategy for compiling fst's algorithms and compare performance.