tantivy Document Score

trafficstars

I have been asked about the scoring algorithm that tantivy uses and realised that neither I, nor the documentation have a canonical description for it apart from:

The larger the number, the more relevant the document to the search

https://docs.rs/tantivy/0.10.3/tantivy/type.Score.html

I think it will be great to add more information and run through an example query on an index to show why queries return results in that order and how a user might debug specific queries.

Who do we expect to read this?

People building a full-text search engine are interested in efficiently storing and ranking documents against queries. The score of each document is arguably THE most important data type that we return to users in every query. I expect most users of tantivy will want to read about the Score type at one point or another.

2 types of users:

knowledgeable about building search engines and wants to confirm the validity of tantivy's scoring algorithm - expect to see tf/idf, BM25 and other known
someone for whom tantivy might be the first experience building a search application with little background on document scoring - want answers to specific questions and some further reading material.

Questions these users want to answer:

[ ] Why are search results in this order? What is this score field? Why is it a float?
[ ] How does each subquery in the full query (eg. q: "title:president AND (body:Obama OR body:barack) AND year:<2008") contribute to the final score of a document
[ ] I want to boost/expected a specific document higher up in the set of results for a given query - how do I do that?

Provide further reading material

Give links to tf-idf, BM25 wikipedia pages and the Query::explain method

If you do this ticket, you will learn:

The full life-cycle of a tantivy query from query to score per document
tantivy helper methods for debugging such queries
writing concise, yet informative documentation for power-users and amateurs at the same time

Nov 14 '19 00:11 petr-tik

hey @jeffsmith82, Thought you might find this ticket interesting.

appreciate you may have been busy recently, so let us know, if you have little bandwidth to do this.

Nov 14 '19 00:11 petr-tik

Uh, was this ever implemented @petr-tik?

Jan 29 '24 09:01 safwansamsudeen

I don't think this is properly documented

Jan 29 '24 11:01 PSeitz

tantivy
tantivy copied to clipboard

Document Score

Who do we expect to read this?

2 types of users:

Questions these users want to answer:

Suggested style of documentation

Provide further reading material

If you do this ticket, you will learn:

tantivy tantivy copied to clipboard

Document Score

Who do we expect to read this?

2 types of users:

Questions these users want to answer:

Suggested style of documentation

Provide further reading material

If you do this ticket, you will learn:

tantivy
tantivy copied to clipboard