rank_bm25 icon indicating copy to clipboard operation
rank_bm25 copied to clipboard

Detect presence instead of frequency

Open kripper opened this issue 5 months ago • 0 comments

For our use case (identify certificate types) we want to retrieve docs that contain certain keywords without considering the number of times a keyword is present in a given document. If a keyword repeats many times in the document, it shouldn't have more score than if it only appears once.

For our use case the score should be given by the number of different keywords that appear in the text. Each keyword apprearence should sum a predefined keyword-score.

It is also desirable that keywords can be formed by single or multiple words separated by spaces (eg: the keyword "certificate of origin" will have a predefined bigger score then the keyword "certificate").

Does this implementation support this use case?

kripper avatar Feb 06 '24 08:02 kripper