read-ICESat-2
read-ICESat-2 copied to clipboard
nsidc_icesat2_sync skips files it should download
Hi Tyler,
It looks like the test nsidc_icesat2_sync
uses to check whether or not the remote file at NSIDC should be downloaded isn't sufficient. The current test in the http_pull_file
function (ignoring the clobber flag) is just a comparison of the file's modification time: if the local file is newer, leave it. There is a case where the script breaks mid-download before the os.utime
line can reset the local file's modification time to that of the remote file. In this case, the (corrupt) local file will have a modification time of when the script broke and so it will not be replaced upon re-running nsidc_icesat2_sync
.
I didn't delve into the XML file that is being parsed in nsidc_list
, so I don't know what other parameters are available for the test in http_pull_file
, but replacing the file modification time test with a checksum (or even just file size) would catch this issue (and any other potential download issues). An easy fix potentially, but I didn't have a moment to check the XML tree and this might have impacts elsewhere in the repo, so opening it as an issue for a bit of discussion.
Matt
These are good points. I added the option --checksum
to the sync program in PR #34. It doesn't compare with any hash in the xml file (I couldn't find one in the file I searched but I might have missed it). Instead, it checks the hash of a file that exists in the file system and the one it downloads. The problem is that this method will be quite slow in comparison since it has to download every file.