modules icon indicating copy to clipboard operation
modules copied to clipboard

use md5sum to check download integrity with vdb-validate as fallback

Open suhrig opened this issue 4 months ago • 0 comments

PR checklist

As explained in https://github.com/ncbi/sra-tools/issues/896, vdb-validate does not detect file corruption if the prefetched files do not contain MD5 checksums. It has happened to me many times that downloaded files turn out to be corrupt, if I use the option force_sratools_download (now --download_method sratools). What is worse is that extracting the files using fasterq-dump does not always result in an error even if the file is corrupt. It is even conceivable that the extracted FastQ file looks perfectly intact with only some bases or quality values being changed. As such, the error may go completely unnoticed.

This PR fixes this by (1) fetching the md5sum from the SRA Data Locator API and (2) performing a manual md5sum check. The current method, vdb-validate is only used anymore if the md5sum cannot be obtained from the API.

suhrig avatar Mar 01 '24 14:03 suhrig