wget2
wget2 copied to clipboard
not downloading all files, -R not working, many errors
Every month I download the new set of Mame roms from bda.retroroms.info using the following command:
wget -r --level=1 -np -nH -nc --cut-dirs=2 --http-user=myusername --http-password=mypassword -R "index.html*" -R "desktop.ini" https://bda.retroroms.info:82/downloads/mame/mame-0266/
However, since Fedora upgraded to wget2, I have been observing the following issues:
- Prompt says "Errors: 100+"
- Only a small subset of the files in the directory is downloaded. If I delete the folder, and run again, a different subset is downloaded each time.
- The -R option doesn't seem to be working, because the index.html files are kept.
Here is an example prompt output:
9 files 100% [===========================================================================================================================================================================================>] 2.19M 1013KB/s 11 files 100% [===========================================================================================================================================================================================>] 20.83M 1.38MB/s index.html?C=M&O=D 100% [===========================================================================================================================================================================================>] 2.16K --.-KB/s index.html 100% [===========================================================================================================================================================================================>] 607 --.-KB/s zerotimeb.zip 100% [===========================================================================================================================================================================================>] 494 --.-KB/s [Files: 23 Bytes: 22.98M [1.23MB/s] Redirects: 0 Todo: 0 Errors: 113
Note that only 23 files were downloaded, but the directory I requested had 67 files and 1 empty folder. The complete file list can be seen here, by browsing into /downloads/mame/mame-0266/
I also tried adding --max-threads=1 but that did not solve the issue: 90+ errors.
30 files 100% [===========================================================================================================================================================================================>] 24.86M 1.88MB/s [Files: 30 Bytes: 24.79M [1.32MB/s] Redirects: 0 Todo: 0 Errors: 91
That is possibly a rate limiter on the server. Please check if --wait, --random-wait, --limit-rate helps here.
I tried --wait=2 and it downloaded all files successfully, but it still says "Errors: 69".
Is it possible to display the error messages? I tried --verbose, but it didn't help.
Also, the issue with -R "index.html*" not working remains.
Use --progress=none to see the textual output including response codes.
I'll take a look into why -R keeps the files in a few days.
I just ran the command again, this time with --wait=2 and --progress=none. All files were downloaded and there were 68 errors: 1 error 404 (not found) and 68 errors 401 (unauthorized).
The 404 error and one of the 401 errors refer to https://bda.retroroms.info:82/robots.txt. I don't understand why it is trying to download that file, as it is not in the requested folder: https://bda.retroroms.info:82/downloads/mame/mame-0266/
This was the command:
wget -r --level=1 -np -nH -nc --cut-dirs=2 --http-user=myuser --http-password=mypassw --max-threads=1 --wait=2 -R "index.html*" -R "desktop.ini" --progress=none https://bda.retroroms.info:82/downloads/mame/mame-0266/
And these were the first lines of the output:
[0] Downloading 'https://bda.retroroms.info:82/robots.txt' ... HTTP ERROR response 401 [https://bda.retroroms.info:82/robots.txt] [0] Downloading 'https://bda.retroroms.info:82/robots.txt' ... HTTP ERROR response 404 [https://bda.retroroms.info:82/robots.txt]
The remaining (67) 401 errors refer to each of the 67 zip files in the requested folder. Note that all these zip files were succesfully downloaded. For example:
[0] Downloading 'https://bda.retroroms.info:82/downloads/mame/mame-0266/aim65.zip' ... HTTP ERROR response 401 [https://bda.retroroms.info:82/downloads/mame/mame-0266/aim65.zip] [0] Downloading 'https://bda.retroroms.info:82/downloads/mame/mame-0266/aim65.zip' ... Saving 'mame-0266/aim65.zip' HTTP response 200 [https://bda.retroroms.info:82/downloads/mame/mame-0266/aim65.zip]
I also tried running with --progress=none, but without --wait=2. A bunch of 503 errors occurred, and only 16 of the 67 zip files were downloaded. This was successfully fixed using --wait=2, though (as commented above).
wget -r --level=1 -np -nH -nc --cut-dirs=2 --http-user=myusername --http-password=mypassword -R "index.html*" -R "desktop.ini" https://bda.retroroms.info:82/downloads/mame/mame-0266/
- The -R option doesn't seem to be working, because the index.html files are kept.
There seems to be difference in how -R is understood between wget and wget2, in wget you could get away with -R "unwantedfilename*", but in wget2 you need also a wildcard at the beginning of the pattern, such as -R "*unwantedfilename*" presumably to catch also the start of the URL.
There seems to be difference in how -R is understood between wget and wget2, in wget you could get away with
-R "unwantedfilename*", but in wget2 you need also a wildcard at the beginning of the pattern, such as-R "*unwantedfilename*"presumably to catch also the start of the URL.
That actually worked! Thank you.