openlibrary icon indicating copy to clipboard operation
openlibrary copied to clipboard

7429/feature/add trending score to solr

Open benbdeitch opened this issue 1 year ago • 0 comments

Closes #7429

This PR adds support for trending scores to Solr, allowing us to better track which works are achieving a statistically notable increase in popularity. It adds several new fields, and comes with two scripts to be run-- one daily, the other hourly, to keep this information constantly up to date.

Currently, it's still in draft mode, as there is currently no code to automatically run the scripts.

Technical

This implementation uses Solr's ability to update documents in place, which requires the new trending fields to not be stored or indexed, and instead treated as a docValue. Essentially, they are left out of Solr's inverted index, and instead treated as a more usual document-to-value mapping.

This is both A) more performant than atomic updates, and B) avoids the issues that atomic updates can have with copyfield values.

  1. Delete your solr container and all related volumes.
  2. Run docker compose up.
  3. Going to your local solr instance, run a search for a work on Solr (e.g. key:"/works/OL54120W"), and check to ensure that the new fields are present.
  4. Save a work to your 'want-to-read' list.
  5. In another instance of the command line, run
docker compose exec web bash
cd scripts
python calculate_trending_scores_hourly.py
  1. After a minute or so, run the search on Solr again, and see if the appropriate trending field has updated.

Screenshot

Stakeholders

@cdrini

benbdeitch avatar Sep 13 '24 22:09 benbdeitch