read-ICESat-2 icon indicating copy to clipboard operation
read-ICESat-2 copied to clipboard

nsidc_icesat2_sync skips files it should download

Open mrsiegfried opened this issue 3 years ago • 1 comments

Hi Tyler,

It looks like the test nsidc_icesat2_sync uses to check whether or not the remote file at NSIDC should be downloaded isn't sufficient. The current test in the http_pull_file function (ignoring the clobber flag) is just a comparison of the file's modification time: if the local file is newer, leave it. There is a case where the script breaks mid-download before the os.utime line can reset the local file's modification time to that of the remote file. In this case, the (corrupt) local file will have a modification time of when the script broke and so it will not be replaced upon re-running nsidc_icesat2_sync.

I didn't delve into the XML file that is being parsed in nsidc_list, so I don't know what other parameters are available for the test in http_pull_file, but replacing the file modification time test with a checksum (or even just file size) would catch this issue (and any other potential download issues). An easy fix potentially, but I didn't have a moment to check the XML tree and this might have impacts elsewhere in the repo, so opening it as an issue for a bit of discussion.

Matt

mrsiegfried avatar Jul 05 '21 02:07 mrsiegfried

These are good points. I added the option --checksum to the sync program in PR #34. It doesn't compare with any hash in the xml file (I couldn't find one in the file I searched but I might have missed it). Instead, it checks the hash of a file that exists in the file system and the one it downloads. The problem is that this method will be quite slow in comparison since it has to download every file.

tsutterley avatar Jul 12 '21 23:07 tsutterley