pisa
pisa copied to clipboard
Refactor query term weights
Currently, term weighting is handled within the Cursors
classes. In particular, the ScoredCursor
class stores the query term weight (the weight assigned to a term at query time, usually set to 1.0 but can be set on a per-term basis by the 'user') and this weight can be pulled out of the cursor with the query_weight()
function.
Each of the cursors handles this weighting behind the scenes; instead of scoring a document/term pair by the ranking function, it will instead return the rank function output multiplied by the term weight -- This all happens "silently" from within the cursor, so nothing special needs to be done in the algorithm itself. The same goes for the upper-bound scores, which are multiplied by the term weight before being stored. [See #467 for more information].
The problem is that for the block_max
approaches, the unweighted block_max score is actually returned, and the weight calculation is handled directly by the algorithm. See the following example:
https://github.com/pisa-engine/pisa/blob/0efb7926d4928c8aae672ba4d7a1419891f2676d/include/pisa/query/algorithm/block_max_wand_query.hpp#L71
I think the desired behavior would be to modify the block_max_score_cursor
's block_max
call to do the multiplication with the term_weight
before returning it. That is, modifying the following line:
https://github.com/pisa-engine/pisa/blob/master/include/pisa/cursor/block_max_scored_cursor.hpp#L32
Then, what we'd need to do, is to re-work each of the block_max
algorithms to remove their explicit weight multiplication (since it will be done inside the cursor). Everything should then work as expected.
The main point of this issue is to discuss whether having the term and impact weights "coupled" tightly makes sense, and if there is ever a case where we might not want this tight coupling. My expectation is that coupling will make things much simpler, and if a user ever wanted to de-couple then we could implement additional _unweighted
versions of each function and expose them through the cursor.
@elshize and @amallia -- What do you think?