hathitrustPDF Migrate to CLI-based tool, multithread download, better error handling

Migrate to CLI-based tool, multithread download, better error handling

Open Midnight145 opened this issue 1 year ago • 2 comments

The tool is now command-line based, instead of modifying the link variable directly.

You can use --link [link] or --input-file [path] to download books

Configurable output path

You can use --output-path in conjunction with --link, or --input-file formatted with link,output_path

Removes temporary pages by default

Can be kept by passing --keep

Book splicing can be done with flags instead of modifying variables directly

Configured with flags --begin, --end

Uses os.path.join instead of hardcoded UNIX-based paths

The main download is multithreaded to download multiple files at once, useful for larger books

Defaults to 5 threads, configurable with --thread-count

Retries failed downloads several times and will sleep for a few seconds in case of 429 Too many requests

Defaults to 3 retries, configurable with --retries

In the event of missing pages, you get a prompt at the end where you can force continue, download manually, or attempt to redownload automatically (single-threaded)

Jun 28 '23 19:06 Midnight145

Great work!

Sep 19 '23 21:09 SchmueI

Upstream is also currently broken due to it being unable to find the page count, this also fixes that.

Aug 13 '24 20:08 Midnight145

hathitrustPDF hathitrustPDF copied to clipboard

Migrate to CLI-based tool, multithread download, better error handling

hathitrustPDF
hathitrustPDF copied to clipboard