pisa icon indicating copy to clipboard operation
pisa copied to clipboard

Move codec implementations to cpp and make codec dependencies private

Open elshize opened this issue 4 years ago • 10 comments

Because the implementation of block-wise decode (and encode but this one is not as crucial) is moved out of the header, there is a potential that this will affect performance.

Personally, I don't think these are being inlined anyway but should be tested. I additionally added PISA_ENABLE_IPO cmake option in case we want to test link time optimization.

elshize avatar Jun 14 '21 18:06 elshize

Codecov Report

Merging #465 (f5945bb) into master (a28100d) will decrease coverage by 0.07%. The diff coverage is 50.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #465      +/-   ##
==========================================
- Coverage   93.26%   93.18%   -0.08%     
==========================================
  Files          92       86       -6     
  Lines        4926     4785     -141     
==========================================
- Hits         4594     4459     -135     
+ Misses        332      326       -6     
Impacted Files Coverage Δ
include/pisa/codec/block_codecs.hpp 86.61% <ø> (-1.78%) :arrow_down:
include/pisa/codec/varintgb.hpp 62.79% <ø> (ø)
include/pisa/forward_index.hpp 95.12% <ø> (ø)
include/pisa/codec/VarIntG8IU.h 50.84% <50.42%> (+0.84%) :arrow_up:
include/pisa/bit_vector.hpp 97.54% <0.00%> (-0.41%) :arrow_down:
include/pisa/block_posting_list.hpp 99.21% <0.00%> (+0.01%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update a28100d...f5945bb. Read the comment docs.

codecov[bot] avatar Jun 14 '21 19:06 codecov[bot]

@JMMackenzie do you think you'd be able to run a benchmark comparing this to master? I think one block codec should be enough to verify if there's any regression, unless by looking at the code you can see more potential regressions...

elshize avatar Jun 14 '21 19:06 elshize

CW09B with block_simdbp encoding. About 5000 queries of varying lengths.Based on the numbers below, it looks like we are paying a little for this.

AND k = 1000

Master: {"type": "block_simdbp", "query": "and", "avg": 10647.6, "q50": 3791, "q90": 28323, "q95": 43043, "q99": 93122}
This: {"type": "block_simdbp", "query": "and", "avg": 10747.1, "q50": 3743, "q90": 28466, "q95": 43279, "q99": 94556}

MaxScore k = 10

Master: {"type": "block_simdbp", "query": "maxscore", "avg": 19966.6, "q50": 10714, "q90": 51073, "q95": 69106, "q99": 115939}
This: {"type": "block_simdbp", "query": "maxscore", "avg": 24766.2, "q50": 14654, "q90": 62199, "q95": 81999, "q99": 126860}

MaxScore k = 1000

Master: {"type": "block_simdbp", "query": "maxscore", "avg": 35093.4, "q50": 24444, "q90": 79897, "q95": 104968, "q99": 170087}
This: {"type": "block_simdbp", "query": "maxscore", "avg": 39695.4, "q50": 29151, "q90": 88562, "q95": 114082, "q99": 176220}

JMMackenzie avatar Jun 15 '21 00:06 JMMackenzie

Thanks @JMMackenzie can you try passing -DPISA_ENABLE_IPO to cmake, recompile and run again? I'm curious if this helps.

elshize avatar Jun 15 '21 00:06 elshize

Actually, I see a bug there, I name the option with PISA_ but the check is without it (right below). You'd need to fix that. Whatever you do, please check if the lto flag is actually passed to the linker to make sure we compare the right thing.

elshize avatar Jun 15 '21 00:06 elshize

Actually, I see a bug there, I name the option with PISA_ but the check is without it (right below). You'd need to fix that. Whatever you do, please check if the lto flag is actually passed to the linker to make sure we compare the right thing.

Already saw and fixed that. Having some trouble with CMake now, working through it. Will report back.

JMMackenzie avatar Jun 15 '21 00:06 JMMackenzie

You'll also need:

cmake_policy(SET CMP0069 NEW) # Use IPO (LTO) when requested

elshize avatar Jun 15 '21 01:06 elshize

Doesn't seem to help too much:

AND k = 1000

Master: {"type": "block_simdbp", "query": "and", "avg": 10647.6, "q50": 3791, "q90": 28323, "q95": 43043, "q99": 93122}
This: {"type": "block_simdbp", "query": "and", "avg": 10747.1, "q50": 3743, "q90": 28466, "q95": 43279, "q99": 94556}
IPO: {"type": "block_simdbp", "query": "and", "avg": 10673.2, "q50": 3786, "q90": 28325, "q95": 42724, "q99": 93325}

MaxScore k = 10

Master: {"type": "block_simdbp", "query": "maxscore", "avg": 19966.6, "q50": 10714, "q90": 51073, "q95": 69106, "q99": 115939}
This: {"type": "block_simdbp", "query": "maxscore", "avg": 24766.2, "q50": 14654, "q90": 62199, "q95": 81999, "q99": 126860}
IPO: {"type": "block_simdbp", "query": "maxscore", "avg": 24060.2, "q50": 14279, "q90": 60437, "q95": 79765, "q99": 123746}


MaxScore k = 1000

Master: {"type": "block_simdbp", "query": "maxscore", "avg": 35093.4, "q50": 24444, "q90": 79897, "q95": 104968, "q99": 170087}
This: {"type": "block_simdbp", "query": "maxscore", "avg": 39695.4, "q50": 29151, "q90": 88562, "q95": 114082, "q99": 176220}
IPO: {"type": "block_simdbp", "query": "maxscore", "avg": 40741.5, "q50": 30760, "q90": 89645, "q95": 114450, "q99": 173881}

JMMackenzie avatar Jun 15 '21 02:06 JMMackenzie

Are we willing to take this performance hit? I'm happy to revisit these tests soon if we want to make sure...

JMMackenzie avatar Mar 14 '22 11:03 JMMackenzie

Are we willing to take this performance hit? I'm happy to revisit these tests soon if we want to make sure...

Possibly... but not entirely sure. Kind of want to take some time and think about it, mostly whether there's a different way of improving compilation but without the performance hit. Hope to have some time for it sometime this week.

elshize avatar Mar 14 '22 23:03 elshize