openlibrary icon indicating copy to clipboard operation
openlibrary copied to clipboard

Adding Search / Filter capability to Reading Log

Open mekarpeles opened this issue 3 years ago • 3 comments

Describe the problem that you'd like solved

(What it would take to) add search capabilities to the Reading Log page.

Proposal & Constraints

Please note, that currently, this proposal will not work because book titles, authors, and the other data we'd like to search for are not kept in our ReadingLog db table, only the OL identifiers. We may be able to achieve this with solr in the future. But assuming we did have the desired info in our database (and as a thought exercise):

NOTE: Read first, Working with Reading Log here: https://github.com/internetarchive/openlibrary/issues/4267#issuecomment-824151094

  1. First, we'd need to update the Reading Log html template (https://github.com/internetarchive/openlibrary/blob/master/openlibrary/templates/account/books.html) to include a search box (design task). For a first version, we'd probably use an html form which submits a GET search query , similar to what we have on the author's page: https://openlibrary.org/authors/OL7283091A openlibrary org_authors_OL7229114A_Robert_Alan_Hill (1). In the future, we might want to use javascript (similar to how we the real-time Search box works at the top of the website): openlibrary org_authors_OL7229114A_Robert_Alan_Hill (2)
  2. Next, we'd need to update the public_my_books controller method in https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/upstream/account.py#L733-L760 to accept a GET parameter. Already, the function expects a page variable to be sent as a GET parameters (https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/upstream/account.py#L738) so accomplishing this should be as straightforward as adding another parameters like, i = web.input(page=1, search=None).
  3. When/where we fetch the patron's books here: https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/upstream/account.py#L754, we need to alter the logic to check whether a i.search query is present (e.g. if i.search). If the i.search value is present, we'll need change the line readlog.get_works call so this optional search parameter is passed along with our request for matching books.
  4. readlog is an instance of plugins.upstream.account.ReadingLog (class defined here: https://github.com/internetarchive/openlibrary/blob/1f57759886b65430d805270830677120c1dc067d/openlibrary/plugins/upstream/account.py#L645). Its get_works method (https://github.com/internetarchive/openlibrary/blob/1f57759886b65430d805270830677120c1dc067d/openlibrary/plugins/upstream/account.py#L716) will need to be updated to accept an optional search parameter (e.g. (key, page=1, limit=RESULTS_PER_PAGE, search=None)). This ReadingLog.get_works function essentially uses a KEYS dictionary (defined here: https://github.com/internetarchive/openlibrary/blob/1f57759886b65430d805270830677120c1dc067d/openlibrary/plugins/upstream/account.py#L654-L660) to lookup and then invoke the proper book-fetching function.
  5. Each of the corresponding ReadingLog methods referenced by the KEYS dictionary (namely: get_waitlisted_editions, get_loans, get_want_to_read, get_currently_reading, get_already_read) must thus also be updated to take an optional search parameter. Each of these functions ultimately makes an API call to the same function within our Bookshelves API model: Bookshelves.get_users_logged_books (https://github.com/internetarchive/openlibrary/blob/master/openlibrary/core/bookshelves.py#L118-L149)
  6. After a search box form has been added to the template, the public_my_books view/controller has been edited to expect a search parameter, this search parameter is forwarded to our readlog.get_works call, and the readlog object (i.e. the ReadingLog class) have all been updated to accept an optional search parameter, we'll then need to do the hard work of modifying the actual API Bookshelves.get_users_logged_books (the thing which calls the database) to consider the possibility of an optional search parameter when requesting data from the database: https://github.com/internetarchive/openlibrary/blob/master/openlibrary/core/bookshelves.py#L118-L149).

Related to

#4262, #4267

Stakeholders

@cdrini

mekarpeles avatar Apr 21 '21 15:04 mekarpeles

Implementation One possible way to do this is to:

  1. take the search query...
  2. get a list of all books on a patron's want to read list (careful, this could be 10k+ books!)
  3. Fetch the work ids from infobase in bulk for the search query (perhaps, if < 1k titles -- this would serve most people, and if there are more than e.g. 1k titles, don't show the search form for now :( -- I know... the people w/ most titles most need search)
  4. Do a simple check if search query == or is in the book title.

For now, because this is expensive, we probably can't do real-time search (like we do on the topnav)

Perhaps the actual solution is to use (a) use solr or (b) have this information mirrored in their archive.org items (privately) or (c) to include the book title in the bookshelves db (which may affect performance)

mekarpeles avatar Jan 07 '22 19:01 mekarpeles

In short we'd probably want a solr query like:

{
    'fq': 'key:(/works/OL1W OR /works/OL234W)',  # You know but dynamic
    'q.op': 'AND',
    'q': q,  # User query
    'start': offset,
    'rows': limit,
    'fl': ','.join(DEFAULT_SEARCH_FIELDS),  # From worksearch/code.py
    'qt': 'standard',
    'sort': 'work_count desc',
    'wt': 'json',
    'defType': 'edismax',
    'qf': 'text title^20 author_name^20'
},

We should DRY this up more because it duplicates some of the search page logic, but this is fine for now. Pass through execute_solr_query to get the books!

cdrini avatar Jun 14 '22 17:06 cdrini

Yes, this would be a big win! e.g. Getting list of all work ids from a patrons reading log shelf and then limit a solr search to these IDs!

mekarpeles avatar Aug 25 '22 18:08 mekarpeles

I'd like to work on this, @mekarpeles.

scottbarnes avatar Sep 27 '22 16:09 scottbarnes

This should do the trick:

do_search(
    {'q': 'rowling' + ' key:(/works/OL1W OR /works/OL2W)'},
    sort=None,
    page=1,
    rows=20,
)

cdrini avatar Sep 30 '22 22:09 cdrini

Sample mockup just copying authors page image

cdrini avatar Sep 30 '22 22:09 cdrini

I am looking for an issue for my first contribution . Can you help me here.

techrajdeep avatar Oct 09 '22 05:10 techrajdeep