openverse-api
openverse-api copied to clipboard
Investigate use of the BM25 algorithm to search image titles (original #288)
This issue has been migrated from the CC Search API repository
Author: kgodey
Date: Sat Apr 27 2019
Labels: ✨ goal: improvement,🏷 status: label work required,🙅 status: discontinued
The similarity algorithm used to search titles was switched from BM25 to boolean in https://github.com/creativecommons/cccatalog-api/pull/281 to avoid ranking repeated words in titles higher.
We should investigate switching back to BM25 and set the k1 tuning value to a low value just for the title field.
See https://github.com/creativecommons/cccatalog-api/pull/281#pullrequestreview-230833576 and BM25 algorithm docs for more info.
Original Comments:
annatuma commented on Thu Jan 23 2020:
@aldenstpage I'm putting this in Q2 of the backlog, given that there are other search algorithm improvements scheduled for then. Please evaluate if this is a fit for community contributions and if so, label it accordingly. source