pisa icon indicating copy to clipboard operation
pisa copied to clipboard

Dynamic dispatch for block codecs

Open elshize opened this issue 11 months ago • 5 comments

Use dynamic dispatch for block codecs. The old class template block_freq_index is replaced by the BlockInvertedIndex class. This class contains a shared pointer to a BlockCodec object, which is a superclass of all block codec implementations.

Changelog-changed: BlockInvertedIndex replaces block_freq_index Changelog-performance: Expect a slight performance drop of all block codecs

elshize avatar Mar 23 '24 19:03 elshize

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.86%. Comparing base (277cbe9) to head (6178ca5).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #579      +/-   ##
==========================================
+ Coverage   93.27%   93.86%   +0.58%     
==========================================
  Files          89       80       -9     
  Lines        4448     3881     -567     
==========================================
- Hits         4149     3643     -506     
+ Misses        299      238      -61     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 23 '24 19:03 codecov[bot]

Benchmarking on MS-Marco v2 passages with BM25:

main

k=10
{"type": "block_simdbp", "query": "wand", "avg": 12999.1, "q50": 5843, "q90": 32519, "q95": 47480, "q99": 95925}
{"type": "block_simdbp", "query": "maxscore", "avg": 9178.09, "q50": 3963, "q90": 23918, "q95": 33600, "q99": 66648}
{"type": "block_simdbp", "query": "ranked_or", "avg": 458066, "q50": 387151, "q90": 968641, "q95": 1.2303e+06, "q99": 1.8538e+06}

k=1000
{"type": "block_simdbp", "query": "wand", "avg": 48258.5, "q50": 24904, "q90": 110636, "q95": 175410, "q99": 379647}
{"type": "block_simdbp", "query": "maxscore", "avg": 31727.1, "q50": 17407, "q90": 69608, "q95": 112551, "q99": 243058}
{"type": "block_simdbp", "query": "ranked_or", "avg": 458130, "q50": 390417, "q90": 964131, "q95": 1.22383e+06, "q99": 1.84154e+06}

this

k=10
{"type": "block_simdbp", "query": "wand", "avg": 12784.1, "q50": 5622, "q90": 32250, "q95": 47708, "q99": 95753}
{"type": "block_simdbp", "query": "maxscore", "avg": 9132.16, "q50": 3899, "q90": 23733, "q95": 35401, "q99": 66252}
{"type": "block_simdbp", "query": "ranked_or", "avg": 445143, "q50": 372223, "q90": 941900, "q95": 1.20444e+06, "q99": 1.8235e+06}

k=1000
{"type": "block_simdbp", "query": "wand", "avg": 48205.2, "q50": 24919, "q90": 111024, "q95": 175868, "q99": 374888}
{"type": "block_simdbp", "query": "maxscore", "avg": 31675.2, "q50": 17268, "q90": 69635, "q95": 112292, "q99": 247485}
{"type": "block_simdbp", "query": "ranked_or", "avg": 450663, "q50": 374939, "q90": 955607, "q95": 1.21881e+06, "q99": 1.87744e+06}

JMMackenzie avatar Mar 26 '24 22:03 JMMackenzie

In other words, it looks to have no performance penalty? Seems kind of weird but I'll take it. We should double check that we're doing what we think we're doing I guess...

JMMackenzie avatar Mar 26 '24 23:03 JMMackenzie

I tested stream_vbyte as well with MaxScore k=10, found no regressions.

main
{"type": "block_streamvbyte", "query": "maxscore", "avg": 9534.42, "q50": 4137, "q90": 24831, "q95": 34820, "q99": 68842}

this
{"type": "block_streamvbyte", "query": "maxscore", "avg": 9465.44, "q50": 4100, "q90": 24682, "q95": 34673, "q99": 68516}

Edit: "main" is not really main, it's just queries vs queries_dynamic in this branch. I think this can be merged once clang-format is happy?

JMMackenzie avatar Apr 01 '24 23:04 JMMackenzie

@amallia FYI we did some testing here regarding dynamic dispatch for block codecs. Seems no performance regression.

I think we'll have to go for it. This will simplify the code, make the compilation faster, and so on. I'm planning to work on porting all the code to the new type-erased block codec as part of this PR. Let me know if you have any comment regarding the general idea; I'll let you both know when it's ready for review.

elshize avatar Apr 02 '24 00:04 elshize

@JMMackenzie Thanks! Before merging, I'll clean up the git history here. I might just squash all of it, but at least I want it clean and have some important info in it, including some performance disclaimers.

elshize avatar Jul 20 '24 14:07 elshize