fastscancount
fastscancount copied to clipboard
Leo's populate_hits_avx
I took Leo's PR and reverted back the change in cache size, to isolate the effect of his new populate_hits_avx. (It is hard to reason about multiple changes at once.)
Before (master)...
Trial 1...
AVX2-based scancount
2.78173 cycles/element
2.41643 instructions/cycles
0.00283376 miss/element
Elems per millisecond:
fastscancount_avx2: 1.26256e+06
Trial 2...
AVX2-based scancount
2.75127 cycles/element
2.44319 instructions/cycles
0.00283133 miss/element
Elems per millisecond:
fastscancount_avx2: 1.26693e+06
After... (merging this PR)
Trial 1...
AVX2-based scancount
3.02782 cycles/element
2.36966 instructions/cycles
0.00274635 miss/element
Elems per millisecond:
fastscancount_avx2: 1.17293e+06
Trial 2...
AVX2-based scancount
3.02123 cycles/element
2.37482 instructions/cycles
0.00278034 miss/element
Elems per millisecond:
fastscancount_avx2: 1.17274e+06
As you can see, I observe a performance regression with this PR.
cc @searchivarius