epubcheck icon indicating copy to clipboard operation
epubcheck copied to clipboard

Taking too long to load the epub books

Open Prabakaran1410 opened this issue 3 years ago • 2 comments

We are trying to use this epubcheck for the checking whether epub has audio or not? But it is taking 8-10 secs for loading the book itself. This is impacting our performance. Can you please help us with some options to load the book quickly and get the hasAudio meta info?

Prabakaran1410 avatar Nov 26 '21 07:11 Prabakaran1410

This tool is a simple wrapper around https://github.com/w3c/epubcheck which does an exhaustive analysis of the EPUB file. If the requirement is to only check if the EPUB contains audio that could indeed be done much faster with a custom script that does not load the entire EPUB. If you are interested in sponsoring such a feature I am happy to look into it.

titusz avatar Nov 27 '21 09:11 titusz

Similar performance issue. I have a python script that processes epubs and I use EpubCheck to verify the books both before and after changes. I had been using EpubCheck installed on my Mac with Home-brew to run EpubCheck (using subprocess.run(). So I called the java program EpubCheck from within my python script.

To simplify things with respect to compatibility with Windows, I began using this EpubCheck integration. However, it is really slow compared to just running the java EpubCheck with subprocess.

I measure performance from within my python script for EpubCheck. If I use the Homebrew EpubCheck run with subprocess, on a particular book (that generates no errors and no warnings), the Homebrew version takes 2.2 seconds. If I run the same book with the epubcheck module, it takes 10.3 seconds.

That's nearly five times slower, which seems like a lot to me. Any ideas as to what can be done to get it closer to the performance of subprocess?

tthkbw avatar Jan 26 '22 00:01 tthkbw