tantivy-py icon indicating copy to clipboard operation
tantivy-py copied to clipboard

Adding float64 support, document level boosting, and facet collector

Open AliFlux opened this issue 1 year ago • 1 comments

This PR adds a couple of new features that are present in Tantivy core repo, but not exposed in tantivy-py:

Conjunction by default parameter

By default, tantivy parses queries using OR operator, instead of AND operator. If we want to modify this behavior, we can now set this value when parsing query:

parse_query(text, fields, conjunction_by_default=True)

Floating point support

A new add_float_field function is available so that we can add f64 fields.

Document level boosting #51

We can now give priority to certain documents using the new weight_by_field parameter:

searcher.search(query, limit, weight_by_field='popularity')

TopDocs tweak_score is used, and the callback is abstracted away from python code for performance reasons.

Facet collector

We can now get counts of facets available by specifying the count_facets_by_field parameter:

data = searcher.search(query, limit, count_facets_by_field='genre')

print(data.facet_counts)
# prints a dictionary where keys are facets, and values are counts

AliFlux avatar Jul 13 '22 14:07 AliFlux