wget2 icon indicating copy to clipboard operation
wget2 copied to clipboard

not downloading all files, -R not working, many errors

Open joaoluizcarvalho opened this issue 1 year ago • 8 comments

Every month I download the new set of Mame roms from bda.retroroms.info using the following command:

wget -r --level=1 -np -nH -nc --cut-dirs=2 --http-user=myusername --http-password=mypassword -R "index.html*" -R "desktop.ini" https://bda.retroroms.info:82/downloads/mame/mame-0266/

However, since Fedora upgraded to wget2, I have been observing the following issues:

  1. Prompt says "Errors: 100+"
  2. Only a small subset of the files in the directory is downloaded. If I delete the folder, and run again, a different subset is downloaded each time.
  3. The -R option doesn't seem to be working, because the index.html files are kept.

Here is an example prompt output:

9 files 100% [===========================================================================================================================================================================================>] 2.19M 1013KB/s 11 files 100% [===========================================================================================================================================================================================>] 20.83M 1.38MB/s index.html?C=M&O=D 100% [===========================================================================================================================================================================================>] 2.16K --.-KB/s index.html 100% [===========================================================================================================================================================================================>] 607 --.-KB/s zerotimeb.zip 100% [===========================================================================================================================================================================================>] 494 --.-KB/s [Files: 23 Bytes: 22.98M [1.23MB/s] Redirects: 0 Todo: 0 Errors: 113

Note that only 23 files were downloaded, but the directory I requested had 67 files and 1 empty folder. The complete file list can be seen here, by browsing into /downloads/mame/mame-0266/

joaoluizcarvalho avatar Jun 02 '24 23:06 joaoluizcarvalho

I also tried adding --max-threads=1 but that did not solve the issue: 90+ errors.

30 files 100% [===========================================================================================================================================================================================>] 24.86M 1.88MB/s [Files: 30 Bytes: 24.79M [1.32MB/s] Redirects: 0 Todo: 0 Errors: 91

joaoluizcarvalho avatar Jun 04 '24 11:06 joaoluizcarvalho

That is possibly a rate limiter on the server. Please check if --wait, --random-wait, --limit-rate helps here.

rockdaboot avatar Jun 04 '24 15:06 rockdaboot

I tried --wait=2 and it downloaded all files successfully, but it still says "Errors: 69".

Is it possible to display the error messages? I tried --verbose, but it didn't help.

Also, the issue with -R "index.html*" not working remains.

joaoluizcarvalho avatar Jun 04 '24 20:06 joaoluizcarvalho

Use --progress=none to see the textual output including response codes.

I'll take a look into why -R keeps the files in a few days.

rockdaboot avatar Jun 05 '24 05:06 rockdaboot

I just ran the command again, this time with --wait=2 and --progress=none. All files were downloaded and there were 68 errors: 1 error 404 (not found) and 68 errors 401 (unauthorized).

The 404 error and one of the 401 errors refer to https://bda.retroroms.info:82/robots.txt. I don't understand why it is trying to download that file, as it is not in the requested folder: https://bda.retroroms.info:82/downloads/mame/mame-0266/

This was the command:

wget -r --level=1 -np -nH -nc --cut-dirs=2 --http-user=myuser --http-password=mypassw --max-threads=1 --wait=2 -R "index.html*" -R "desktop.ini" --progress=none https://bda.retroroms.info:82/downloads/mame/mame-0266/

And these were the first lines of the output:

[0] Downloading 'https://bda.retroroms.info:82/robots.txt' ... HTTP ERROR response 401 [https://bda.retroroms.info:82/robots.txt] [0] Downloading 'https://bda.retroroms.info:82/robots.txt' ... HTTP ERROR response 404 [https://bda.retroroms.info:82/robots.txt]

The remaining (67) 401 errors refer to each of the 67 zip files in the requested folder. Note that all these zip files were succesfully downloaded. For example:

[0] Downloading 'https://bda.retroroms.info:82/downloads/mame/mame-0266/aim65.zip' ... HTTP ERROR response 401 [https://bda.retroroms.info:82/downloads/mame/mame-0266/aim65.zip] [0] Downloading 'https://bda.retroroms.info:82/downloads/mame/mame-0266/aim65.zip' ... Saving 'mame-0266/aim65.zip' HTTP response 200 [https://bda.retroroms.info:82/downloads/mame/mame-0266/aim65.zip]

joaoluizcarvalho avatar Jun 07 '24 16:06 joaoluizcarvalho

I also tried running with --progress=none, but without --wait=2. A bunch of 503 errors occurred, and only 16 of the 67 zip files were downloaded. This was successfully fixed using --wait=2, though (as commented above).

joaoluizcarvalho avatar Jun 07 '24 16:06 joaoluizcarvalho

wget -r --level=1 -np -nH -nc --cut-dirs=2 --http-user=myusername --http-password=mypassword -R "index.html*" -R "desktop.ini" https://bda.retroroms.info:82/downloads/mame/mame-0266/

  1. The -R option doesn't seem to be working, because the index.html files are kept.

There seems to be difference in how -R is understood between wget and wget2, in wget you could get away with -R "unwantedfilename*", but in wget2 you need also a wildcard at the beginning of the pattern, such as -R "*unwantedfilename*" presumably to catch also the start of the URL.

Jertzukka avatar Feb 25 '25 12:02 Jertzukka

There seems to be difference in how -R is understood between wget and wget2, in wget you could get away with -R "unwantedfilename*", but in wget2 you need also a wildcard at the beginning of the pattern, such as -R "*unwantedfilename*" presumably to catch also the start of the URL.

That actually worked! Thank you.

joaoluizcarvalho avatar Mar 07 '25 21:03 joaoluizcarvalho