openverse-api icon indicating copy to clipboard operation
openverse-api copied to clipboard

Investigate use of the BM25 algorithm to search image titles (original #288)

Open obulat opened this issue 4 years ago • 0 comments

This issue has been migrated from the CC Search API repository

Author: kgodey
Date: Sat Apr 27 2019
Labels: ✨ goal: improvement,🏷 status: label work required,🙅 status: discontinued

The similarity algorithm used to search titles was switched from BM25 to boolean in https://github.com/creativecommons/cccatalog-api/pull/281 to avoid ranking repeated words in titles higher.

We should investigate switching back to BM25 and set the k1 tuning value to a low value just for the title field.

See https://github.com/creativecommons/cccatalog-api/pull/281#pullrequestreview-230833576 and BM25 algorithm docs for more info.


Original Comments:

annatuma commented on Thu Jan 23 2020:

@aldenstpage I'm putting this in Q2 of the backlog, given that there are other search algorithm improvements scheduled for then. Please evaluate if this is a fit for community contributions and if so, label it accordingly. source

obulat avatar Apr 21 '21 12:04 obulat