calibre_plugins Quality Check

I have a large library with c. 30,000 books in it. Quality Check can take quite a long time to run through these (e.g. check author sort).

I am wondering whether there is scope for some performance improvements. Here is the code for check_author_sort_valid:

    def check_author_sort_valid(self):


        def evaluate_book(book_id, db):
            current_author_sort = db.author_sort(book_id, index_is_id=True)
            authors = db.authors(book_id, index_is_id=True)
            if not authors:
                return True
            authors = [a.strip().replace('|', ',') for a in authors.split(',')]
            if current_author_sort != db.author_sort_from_authors(authors):
                return True
            return False


        self.check_all_files(evaluate_book,
                             no_match_msg=_('All searched books have a valid Author Sort'),
                             marked_text='invalid_author_sort',
                             status_msg_type=_('books for invalid author sort'))

In essence this loops over each book the library and makes 3 db calls (which are presumably against a memory cached copy of the database).

Obviously the code in a plugin is limited by the Calibre API, but I am wondering whether the following optimisations might be possible (using this example, but presumably possible much more generally in this plugin):

Making fewer DB calls for each book (in evaluate_book) - here we make 3 calls to db, to get the book's authors, the book's author_sort and the default author_sort for each of the authors. Would it be possible to do this in fewer calls e.g. by getting the books' metadata object and then getting the authors and author_sort from that, and if so would it be any quicker (I suspect marginal)?
Start the entire check by doing a single SQL query using the books table and the authors table to get the book_ids i.e. something like:
```
SELECT book_id 
FROM books, book_authors, authors 
WHERE books.book_id = book_authors.book_id
AND book_authors.author_id = authors.author_id
AND book_authors.author_sort <> authors.author_sort
```
and then just loop through the results to mark these books for the UI. I suspect that if this was possible, it would be much much quicker.

If the Calibre db API doesn't have this sort of functionality, could you persuade Kovid to develop this sort of raw interface or to accept a PR for it?

Nov 17 '24 12:11 Sophist-UK

That SQL won't do what you want, failing if a book has more than one author. Quality check compares the existing author_sort with an author_sort generated from the list of authors. I don't see how to write a reliable SQL statement that does that. I'm also quite sure that Kovid wouldn't be interested in adding such a specific API, but I might be wrong.

Here is an example of how to do it with current APIs. It is a calibre python template that builds a cache then tests the selected books against the cache. The plugin would of course generate the cache once then loop through the books in the database.

python:
def evaluate(book, context):
	# Cache the data
	# Use the "new" calibre db api
	db = context.db.new_api
	# Get all the book ids in the database
	all_book_ids = db.all_book_ids()
	# Get the computed author sort string for all books 
	computed_sort_string_for_books = db.author_sort_strings_for_books(all_book_ids)
	# Get the actual author sort string for all books
	author_sort_string_for_books = db.all_field_for('author_sort', all_book_ids)

	# Check a book
	# print(' & '.join(computed_sort_string_for_books[book.id]), '***', author_sort_string_for_books[book.id])	
	return str(' & '.join(computed_sort_string_for_books[book.id]) != author_sort_string_for_books[book.id])

It would be interesting to see if this implementation is significantly faster. Filling the cache might be as expensive as the existing code. I don't think so, but I don't know so.

Nov 17 '24 14:11 cbhaley

@cbhaley thanks for the suggested code on this. I (finally) found some time to give this a try and it is significantly more performant. On a 50K book library, the old code takes 3 mins, the new version takes 15 seconds. So definitely worth the effort.

Nov 16 '25 08:11 kiwidude68

Will be fixed in 1.14.5

Nov 16 '25 08:11 kiwidude68

Quality Check - Performance