tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Investigate the bottleneck in the search benchmark using toplev

Open fulmicoton opened this issue 4 years ago • 3 comments

Toplev seems very interesting to identify bottlenecks.

Let's use it and compare the results to our current champion PISA. We want different report for intersections, union, and phrase queries.

fulmicoton avatar Mar 31 '21 01:03 fulmicoton

Can you develop what you expect in this issue ?

  • What is toplev ? A web search didn't turn up anything obvious
  • What is the bottleneck in question to be investigated ?
  • What is PISA ? I see it mentioned in the changelog, is that the same thing ?

scampi avatar Aug 02 '21 22:08 scampi

@fulmicoton can you add a link for Toplev?

@scampi : PISA project is here https://github.com/pisa-engine/pisa

Concerning bottlenecks can be IO or CPU or both. Let's take the example of union term queries you will see that PISA is way faster in the benchmark https://tantivy-search.github.io/bench/ I believe this comes from the fact that their WAND algorithm uses a better data structure to exclude documents that will not make it to the top 10. But this will need to be confirmed for example.

fmassot avatar Nov 24 '21 22:11 fmassot

@scampi for toplev, see the manual here: https://github.com/andikleen/pmu-tools/wiki/toplev-manual

fmassot avatar Dec 07 '21 14:12 fmassot