openlibrary
openlibrary copied to clipboard
Add solr support for synonyms for numbers/abbreviations
I've been using the website for a long time now and one of my biggest gripes is how searching works. When searching for books in OpenLibrary, you often need to write exactly the correct title. This means that if a book uses words for numbers (One, Two, Three etc), searching the same title with digits (1, 2, 3 etc) would give no result.
Another example is if a book uses "Vol." in the title, searching "volume" would net no result even though they mean the same thing. This makes finding specific books a lot more difficult.
Describe the problem that you'd like solved
The search engine searches exact terms, but it should have tolerance when dealing with numbers or words of equivalent meaning. Here's an example:
I would like searching "The Walking Dead Compendium Four" and "The Walking Dead Compendium 4" to find the book.
Proposal & Constraints
The search engine should be error tolerant to words of the same meaning. "Vol." should be the same as writing "Volume" "Two" should be the same as writing "2" or "II" "&" and "and" should also be interchangeable.
Additional context
Another example, but with "vol" and "volume"
I think the solution for this would be to make use of solr's synonyms feature. But some experimenting / investigation needed. Anyone who has some time to experiment with adding synonyms to solr, please do!
@cdrini I would like to do it, how can I?
The search is strict not only with terms, but also with letters. Compare Безпека життєдіяльності and Безпека життєдіяльност. With just one letter missing (і) there are no results
So this is a solr research task; here are some of places where it will need modifications:
The solr schema which defines the various type of text fields has synonyms enabled -- but only at query time:
https://github.com/internetarchive/openlibrary/blob/82bc2f61c8c41363567d398b7b027a16775dbc91/conf/solr/conf/managed-schema#L426-L467
This blog post has some info: https://library.brown.edu/create/digitaltechnologies/using-synonyms-in-solr/
In a nutshell we need synonyms inside https://github.com/internetarchive/openlibrary/blob/ccabd95be2a82c4f79d94b1f10e46ea1d3c5c730/conf/solr/conf/synonyms.txt
And then test locally with a full reindex (See https://github.com/internetarchive/openlibrary/wiki/Solr#making-changes-to-solr-config )
But for numbers, they probably need to be in English only for now? I'm not sure how we should handle non-English numbers. Ideally we'd want different synonyms files for different user locales, but I'm not sure if/how to do this in solr.
But we can definitely add something like vol,vol.,Volume
in there and see if it helps with that!
Actually it looks like the synonyms file is working! You can see the TV
one in action here: https://openlibrary.org/search?q=television+kid&mode=everything .
So adding volume should be easy enough!
@bicolino34 For your issue, that would probably be handled by solr's spell checking features. So having something like "Did you mean?" when a user's query is close to be not perfectly correct. Would you mind creating a separate issue to add support for "Did you mean?" ? That'll require a different approach on the solr side, but would help users a ton!
Is this addressed by #6922? Can this issue be consolidated into that one?
#6922 is the PR which is meant to address this issue, but it's stuck in review and has some issues.