acd_cli
acd_cli copied to clipboard
Hash mismatch on resumed files
I get these errors constantly on 2 different installs. I'm trying to download my data off of ACD, all of my files are 10235MB split 7zip archives. Machine 1 is running ubuntu 14.04 with 250mbps symmetrical BW, Machine 2 is running latest debian stable with gbit symmetrical. The file will be downloading and then the speed drops to 0.0 KBps and it will start downloading again after 15-30 seconds. Then once it reaches 60-90% downloaded it will fail with [ERROR] [acd_cli] - Hash mismatch between local and remote file for "File_Name".
If I set max connections to anything other than 1 every is guaranteed to fail, setting max retries doesn't seem to affect the hash error rate. If I queue 25 files to download I'll get 3-4 without errors, then if I delete the error-ed files and redownload I can get another 3-4 files with the rest being hash errors. SO it's wasting a large amount of BW and time. Because once a hash mismatch happens it will stop downloading that file.
I'll see if I can get some verbose logs of the errors. Anybody have any ideas why I'm getting constant errors?
Please append '.__incomplete' to a failed file, retry and see if the remaining part gets downloaded and the hash is correct.
I tried that on 5 files, it completed them, but it still claimed hash errors for all 5. I checked filesizes and they are complete/correct so I'll try extracting a file from inside to make sure they are intact.
It would be nice if you could
- md5sum the completed files and ascertain that the hashing is correct and there is a download error. That means that
acd-cli find-md5
should not find the independently computed hash. - Do a binary diff on the files and see where the errors occur, provided that you still have or can recreate the original files.
- I used md5sum on 10 different files, a few were downloaded without error; all were correctly matched via find-md5 even the resumed ones from above.
USER@HO$T:~$ md5sum /data/USER/test1/test/BD-000-097.7z.001
b36de352cb66b7641f736adb847ffc11 /data/USER/test1/test/BD-000-097.7z.001
USER@HO$T:~$ sudo acd_cli find-md5 b36de352cb66b7641f736adb847ffc11
[OK9rN6nzRBCYj4ZC1mXyRw] [A] /test/BD-000-097.7z.001
USER@HO$T:~$ md5sum /data/USER/test1/test/BD-000-097.7z.005
4fd6810060d41c153e719b4236ac4ba9 /data/USER/test1/test/BD-000-097.7z.005
USER@HO$T:~$ sudo acd_cli find-md5 4fd6810060d41c153e719b4236ac4ba9
[4ls8JVeZRAyfxVEji_9Cfw] [A] /test/BD-000-097.7z.005
- I do not have the original files, I did extract one archive set completely and I had no errors/issues.
edit: and now I'm getting Code: 1000, msg: ('Connection aborted.', ResponseNotReady('Request-sent',)) I wonder if something is down.
I will add a check that suppresses the hash error messages when a file download is incomplete for some reason. Why a hashing error occurs for resumed files, I don't know.
Regarding the connection error, Amazon has disabled downloads of large files again.
I also have this issue, I have more or less the same example of how it happens. However I have maybe more info since I have the original files inside my archive files.
Anyway, I have a 4GiB archive file, I download it via acd_cli and it has a wrong hash when completed. It usually stops to 0 B/s DL speed for about 30 seconds at some random point, like above, and then resumes itself saying it dropped the connection. After a while it says it failed, so I start the DL again to resume the file to completion, it still reports a failure. I check its size in bytes, it matches the source file size, however the hash is wrong as compared to my own hash, and the one that amazon's metadata reports(which is the same as mine). I then proceeded to extract the files inside the archive, which I also have the original hashes for and then I md5sum check them and I get this: md5sum: WARNING: 468 computed checksums did NOT match
More than half of the files in the archive do not match my originals, the archive seems to be the same size in bytes as my original source file, however it is a mismatch on the hash. Just to further check around I downloaded the same archive file from amazon's website for clouddrive, I do get the correct file and hash by doing that.
I can only assume there may be some bug in the api or acd_cli, but I don't really know enough to say. Hopefully any of this is helpful. If you want I can test things for you since I've already got everything setup for that.
I'm also running into this problem.
I have a file that's about 8.38 GiB (8998338695 B) in size. I made 5 attempts to download this file:
- Success (no special flags).
- Bytes 4204527168 to 8479162056 are corrupted (no special flags).
- Bytes 5772410432 to 8479162056 are corrupted (no special flags).
- Bytes 7937719872 to 8467119236 are corrupted (with
-r 2 -x 8
). - Success (with
-r 4 -x 8
).
Some of these attempts failed midway (or I manually interrupted them) so they had to be resumed, though I don't quite remember which ones did. I know for certain 5 did not fail at all, and 4 did fail midway. So I suspect it's a result of resuming a download. FYI I am downloading them onto an SSD.
(Here, "corruption" means the majority of the bytes in that range do not match the original file at all.)
(Here, "corruption" means the majority of the bytes in that range do not match the original file at all.)
Since you were able to identify the offending byte ranges, could you provide some further information on the corruption?
While trying to reproduce #336, I only was able to reproduce this issue.
It turns out that faulty byte ranges originally appear in the incompletely downloaded files. In one file, a chunk of approximately 500MB is missing at an 1500MB offset.
The resuming itself seems to work fine.
I had an inkling about this. Sorry I took so long. Please try whether the latest commit fixes the issues.