pisa icon indicating copy to clipboard operation
pisa copied to clipboard

MaxScore bug

Open amallia opened this issue 6 years ago • 6 comments

Describe the bug I suspect there is a bug in MaxScore.

https://github.com/pisa-engine/pisa/blob/f8c65c16342537237ce3bc877bec3cbf6d006172/include/pisa/query/algorithm/maxscore_query.hpp#L40-L67

non_essential_lists is unsigned and initialized to 0. In line 63 we do: size_t i = non_essential_lists - 1; . this could potentially result in UB.

I noticed this bug because I was running MaxScore in debug mode and I got this error:

evaluate_queries: /home/amallia/pisa/include/pisa/block_posting_list.hpp:126: void pisa::block_posting_list<BlockCodec, Profile>::document_enumerator::next_geq(uint64_t) [with BlockCodec = pisa::simdbp_block; bool Profile = false; uint64_t = long unsigned int]: Assertion `lower_bound >= m_cur_docid || position() == 0' failed.

next_geq is only called in line 67.

amallia avatar Jul 17 '19 15:07 amallia

I thought that it was well defined behavior ? If non_essential_lists is 0 and we remove 1, we will get i being the maximum value of a uint64_t. Then we check if i + 1 > 0 which is not true, as if will wrap back around to be 0, right? Or is the type of i + 1 in the for loop possibly compiler/architecture specific?

Are you sure this isn't an issue with cur_doc being too large or small?

JMMackenzie avatar Jul 17 '19 22:07 JMMackenzie

Yes, you are right. This makes sense.

Yes the issue is with cur_doc being too small, but I thought it was a problem that derived from another one.

amallia avatar Jul 17 '19 22:07 amallia

@amallia Can we close this one? Or is there still a problem?

JMMackenzie avatar Oct 14 '19 23:10 JMMackenzie

I guess there is still some investigation to be done. Do you have any recommendations?

amallia avatar Oct 15 '19 15:10 amallia

I'm happy to investigate but I'd like to reproduce it first, I can have a look later on and I'll report back once I have some further information.

JMMackenzie avatar Oct 15 '19 21:10 JMMackenzie

Okay so I have had a look. I have provided an example below to show what is happening:

About to call next_geq with cur_doc = 8098  --> Calling from maxscore.hpp
Calling NextGEQ with lower_bound = 8098 --> In next_geq call (block_posting_list.hpp)
m_cur_docid = 8186 and position = 2852
Assertion lower_bound >= m_cur_docid || position() == 0 failed.

So what is happening is that we enter this loop with some current document we are trying to score:

https://github.com/pisa-engine/pisa/blob/f8c65c16342537237ce3bc877bec3cbf6d006172/include/pisa/query/algorithm/maxscore_query.hpp#L62-L67

Since we do not know anything about where these cursors are "pointing", we are actually calling next_geq on an identifier which is lower than the current document in the given list. This is why the assertion fails.

However, I believe it should be well-defined behavior to call next_geq with a smaller identifier -- the outcome would be that nothing happens. Notice that this is the current behavior anyway:

https://github.com/pisa-engine/pisa/blob/f8c65c16342537237ce3bc877bec3cbf6d006172/include/pisa/block_posting_list.hpp#L124-L146

So I think we should refactor next_geq to explicitly allow this behavior (change the assertion). Also worth noting that this bug does not impact the correctness of the algorithm.

JMMackenzie avatar Oct 16 '19 00:10 JMMackenzie