springer_free_books icon indicating copy to clipboard operation
springer_free_books copied to clipboard

Check file after downloading and redownload if corrupt?

Open jaintj95 opened this issue 4 years ago • 4 comments

I used the script to download 14GB worth of files and more than 60% of them turned to be corrupt files to due to incomplete downloads.
Would be great if somehow we could check that the downloaded file is not corrupt.
If corrupt: reinitiate download.

jaintj95 avatar Apr 12 '20 09:04 jaintj95

I tried this out of curiosity and that happened to me as well, most of the files were corrupted. I have created my own script anyway, but it's worth mentioning as a lot of people are using it.

Thanks for this though @alexgand, much appreciated!

emmaKts avatar Apr 13 '20 16:04 emmaKts

@emmaKts How does your script differ from this one as such to prevent the corruption issue?

VikashKothary avatar Apr 13 '20 18:04 VikashKothary

Here all downloads were ok, perhaps the issue is related with the quality of the connection.

I'll leave the issue open, if anyone know how to do this (check if the file is corrupted and restart the download), feel free to do a pull request!

alexgand avatar Apr 15 '20 21:04 alexgand

@jaintj95 it might be that a lot of those were ePub files which were actually PDF files (as no ePub existed). There is a fix since PR #25. Maybe that already helps. (I'd check whether it is part of your local clone; see also PR #26).

A check for PDF files could be done by using e.g. PyPDF2 and ePub files could get checked using zipfile (as it is a zip at the end). This would make it slower, of course. And there is the question on how often you want to try again and how to react if those tries are all used.

pjungermann avatar Apr 16 '20 00:04 pjungermann