Books by "Independent Publisher" are being imported
Problem
#7865 added a rule thats prevent import of an about-to-be-imported record if the publisher is "independently published" (casefolded).
However, that won't match publishers of "Independent Publisher", as seen in this recent import.
We should update is_independently_published so that it's checking the publishers against a collection (e.g. a set) to see if any publisher in the record is in the set of terms that are identified as being independent publishers: https://github.com/internetarchive/openlibrary/blob/fc3ae1192f6d161335c6253e53e3d776d3982c0b/openlibrary/catalog/utils/init.py#L321-L327
We may also want to consider trying to catch more permutations at once by matching against independent* and publish* or something.
Additionally, the tests should be updated: https://github.com/internetarchive/openlibrary/blob/fc3ae1192f6d161335c6253e53e3d776d3982c0b/openlibrary/tests/catalog/test_utils.py#L288-L297
Evidence / Screenshot
Relevant URL(s)
Reproducing the bug
- Try to import a book with a
publishersfield of["Independent Publisher"]. See https://github.com/internetarchive/openlibrary/wiki/Developer's-Guide-to-Data-Importing for help on importing. (Note, this can be tested entirely in unit tests).
- Expected behavior: The book is not imported and
IndependentlyPublishedis raised. - Actual behavior: The book is imported.
Context
No response
Notes from this Issue's Lead
Proposal & constraints
Related files
Stakeholders
Instructions for Contributors
- Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.
Hi @scottbarnes, may I work on this issue please? TIA :)
I assigned this to you, @DebbieSan. Thanks! Please let me know if you have any questions.