hathitrustPDF
hathitrustPDF copied to clipboard
Migrate to CLI-based tool, multithread download, better error handling
The tool is now command-line based, instead of modifying the link variable directly.
- You can use --link [link] or --input-file [path] to download books
Configurable output path
- You can use --output-path in conjunction with --link, or --input-file formatted with
link,output_path
Removes temporary pages by default
- Can be kept by passing --keep
Book splicing can be done with flags instead of modifying variables directly
- Configured with flags --begin, --end
Uses os.path.join
instead of hardcoded UNIX-based paths
The main download is multithreaded to download multiple files at once, useful for larger books
- Defaults to 5 threads, configurable with --thread-count
Retries failed downloads several times and will sleep for a few seconds in case of 429 Too many requests
- Defaults to 3 retries, configurable with --retries
In the event of missing pages, you get a prompt at the end where you can force continue, download manually, or attempt to redownload automatically (single-threaded)
Great work!
Upstream is also currently broken due to it being unable to find the page count, this also fixes that.