wget2
wget2 copied to clipboard
Unhelpful "Failed to read 102400 bytes" and "Failed to transcode" errors
Trying to run a mirror download, getting a lot of vague errors with no indicating of what URLs are triggering the errors so they can be investigated further, and no indicating of what the errors actually mean
The quantity of "Failed to read 102400 bytes" errors varies from run to run, as do the numbers in parenthesis, whatever that means
Running a re-download (files from previous download still on disk), the "Failed to read" messages disappear but the transcode errors remain, in fact, sometimes there are more of them
Despite all the errors printed, it usually says "Errors: 0" at the end; sometimes it says "Errors: 1" but no additional errors are printed in that case
The ordering of the errors also varies from run to run
Sample run using GnuTLS:
XXXXXXXXX:/mnt/s/wget-temp/temp$ rm -rf *
XXXXXXXXX:/mnt/s/wget-temp/temp$ wget2_gnutls --version
GNU Wget2 2.0.1 - multithreaded metalink/file/website downloader
+digest +https +ssl/gnutls +ipv6 +iri +large-file +nls -ntlm -opie +psl -hsts
+iconv +idn2 +zlib +lzma +brotlidec +zstd +bzip2 +lzip +http2 +gpgme
Copyright (C) 2012-2015 Tim Ruehsen
Copyright (C) 2015-2021 Free Software Foundation, Inc.
Please send bug reports and questions to <[email protected]>.
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ time wget2_gnutls -v -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (32)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (32)
Failed to read 102400 bytes (2)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
4477 files 129% [==========================================================================================================================================================================================================>] 24.98M --.-KB/s
4424 files 114% [==========================================================================================================================================================================================================>] 21.61M --.-KB/s
3778 files 108% [==========================================================================================================================================================================================================>] 19.03M --.-KB/s
4743 files 100% [==========================================================================================================================================================================================================>] 20.13M --.-KB/s
3996 files 100% [==========================================================================================================================================================================================================>] 18.13M --.-KB/s
[Files: 21418 Bytes: 103.91M [601.42KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 2m56.952s
user 0m3.625s
sys 0m12.922s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21418
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ time wget2_gnutls -v -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
35 files 106916% [==========================================================================================================================================================================================================>] 8.34M 6.68MB/s files 56830% [==========================================================================================================================================================================================================>] 1.45M --.-KB/18 files 2133% [==========================================================================================================================================================================================================>] 46.35K --.-KB/s9 files 1075% [==========================================================================================================================================================================================================>] 23.37K --.-KB/s18 files 1617% [==========================================================================================================================================================================================================>] 49.19K --.-KB/s [Files: 113 Bytes: 9.91M [133.76KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 1m15.954s
user 0m1.406s
sys 0m8.047s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21418
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ time wget2_gnutls -v -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
21 files 63287% [==========================================================================================================================================================================================================>] 2.80M --.-KB/20 files 55991% [==========================================================================================================================================================================================================>] 1.42M 219.68KB/s8 files 31539% [==========================================================================================================================================================================================================>] 1.47M --.-KB/20 files 245995% [==========================================================================================================================================================================================================>] 4.17M 4.81MB/s files 949% [==========================================================================================================================================================================================================>] 41.26K --.-KB/ss [Files: 113 Bytes: 9.91M [131.80KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 1m17.085s
user 0m1.359s
sys 0m8.625s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21418
XXXXXXXXX:/mnt/s/wget-temp/temp$
Another run using WolfSSL:
XXXXXXXXX:/mnt/s/wget-temp/temp$ rm -rf *
XXXXXXXXX:/mnt/s/wget-temp/temp$ wget2_wolfssl --version
GNU Wget2 2.0.1 - multithreaded metalink/file/website downloader
+digest +https +ssl/wolfssl +ipv6 +iri +large-file +nls -ntlm -opie +psl -hsts
+iconv +idn2 +zlib +lzma +brotlidec +zstd +bzip2 +lzip +http2 +gpgme
Copyright (C) 2012-2015 Tim Ruehsen
Copyright (C) 2015-2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
XXXXXXXXX:/mnt/s/wget-temp/temp$ time wget2_wolfssl -v -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (32)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to read 102400 bytes (11)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
4146 files 122% [==========================================================================================================================================================================================================>] 22.68M --.-KB/s
4199 files 107% [==========================================================================================================================================================================================================>] 20.10M --.-KB/s
4119 files 107% [==========================================================================================================================================================================================================>] 19.43M --.-KB/s
4484 files 100% [==========================================================================================================================================================================================================>] 19.60M --.-KB/s
4456 files 114% [==========================================================================================================================================================================================================>] 22.00M --.-KB/s
[Files: 21403 Bytes: 103.77M [586.54KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 3m1.207s
user 0m4.625s
sys 0m13.547s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21404
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ time wget2_wolfssl -v -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
21 files 180854% [==========================================================================================================================================================================================================>] 4.17M --.-KB26 files 47471% [==========================================================================================================================================================================================================>] 2.82M 1.79MB/s5 files 1842% [==========================================================================================================================================================================================================>] 64.04K 796.22KB/s27 files 56809% [==========================================================================================================================================================================================================>] 1.44M 1.66MB/s4 files 41622% [==========================================================================================================================================================================================================>] 1.41M 1.63MB/s [Files: 113 Bytes: 9.91M [130.32KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 1m17.960s
user 0m1.859s
sys 0m8.828s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21404
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ time wget2_wolfssl -v -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
34 files 97901% [==========================================================================================================================================================================================================>] 5.58M --.-KB/21 files 110073% [==========================================================================================================================================================================================================>] 2.80M 1.74MB/s files 28198% [==========================================================================================================================================================================================================>] 1.43M 1.16MB/19 files 2135% [==========================================================================================================================================================================================================>] 46.40K 955.46KB/s17 files 2353% [==========================================================================================================================================================================================================>] 51.14K 1.72MB/s [Files: 113 Bytes: 9.91M [133.64KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 1m16.025s
user 0m1.484s
sys 0m9.031s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21404
and finally OpenSSL:
XXXXXXXXX:/mnt/s/wget-temp/temp$ rm -rf *
XXXXXXXXX:/mnt/s/wget-temp/temp$ wget2_openssl --version
GNU Wget2 2.0.1 - multithreaded metalink/file/website downloader
+digest +https +ssl/openssl +ipv6 +iri +large-file +nls -ntlm -opie +psl -hsts
+iconv +idn2 +zlib +lzma +brotlidec +zstd +bzip2 +lzip +http2 +gpgme
Copyright (C) 2012-2015 Tim Ruehsen
Copyright (C) 2015-2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Please send bug reports and questions to <[email protected]>.
XXXXXXXXX:/mnt/s/wget-temp/temp$
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
Failed to read 102400 bytes (2)
4431 files 107% [==========================================================================================================================================================================================================>] 20.83M --.-KB/s
3944 files 116% [==========================================================================================================================================================================================================>] 20.86M --.-KB/s
4191 files 115% [==========================================================================================================================================================================================================>] 21.02M --.-KB/s
4550 files 107% [==========================================================================================================================================================================================================>] 20.98M --.-KB/s
4302 files 107% [==========================================================================================================================================================================================================>] 20.21M --.-KB/s
[Files: 21418 Bytes: 103.91M [564.22KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 3m8.619s
user 0m4.453s
sys 0m14.938s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21418
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ time wget2_openssl -v -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
21 files 58071% [==========================================================================================================================================================================================================>] 2.80M 1.16MB/17 files 41770% [==========================================================================================================================================================================================================>] 1.41M --.-KB/51 files 94444% [==========================================================================================================================================================================================================>] 5.63M 4.79MB/s1 files 0% [ <=> ] 26.55K --.-KB/s
13 files 1026% [==========================================================================================================================================================================================================>] 35.67K --.-KB/s [Files: 113 Bytes: 9.91M [128.19KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 1m19.251s
user 0m1.422s
sys 0m7.688s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21418
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ time wget2_openssl -v -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
30 files 139911% [==========================================================================================================================================================================================================>] 5.57M 1.68MB/s files 54858% [==========================================================================================================================================================================================================>] 1.40M 119.43KB/21 files 56225% [==========================================================================================================================================================================================================>] 1.43M 1.47MB/s8 files 1420% [==========================================================================================================================================================================================================>] 49.39K 2.33MB/s31 files 28599% [==========================================================================================================================================================================================================>] 1.45M 1.53MB/s [Files: 113 Bytes: 9.91M [131.03KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 1m17.538s
user 0m1.453s
sys 0m8.203s
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$
XXXXXXXXX:/mnt/s/wget-temp/temp$ find -type f | wc -l
21418
XXXXXXXXX:/mnt/s/wget-temp/temp$
Another thing of note is that, on these particular runs, GnuTLS and and OpenSSL reported 21418 files downloaded & actually downloaded that same number of files. WolfSSL reported 21403 files downloaded but actually downloaded 21404 files (14 files missing). However, this actually seems to vary randomly from run to run and is not actually based on the TLS used. I've not yet deep-dived the random missing file issue to see what files are getting skipped, might open another bug after I do.
Correction, all of the above tests were actually using GnuTLS. Using OpenSSL, I get some of the same errors but there are also differences & new errors:
$ time wget2-openssl --max-threads=1 -o log.txt -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS read error: unexpected eof while reading
Failed to read 102400 bytes (0)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
TLS write error: (null)
TLS read error: unexpected eof while reading
Failed to read 102400 bytes (0)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
TLS read error: unexpected eof while reading
Failed to read 102400 bytes (0)
TLS read error: unexpected eof while reading
Failed to read 102400 bytes (0)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS read error: unexpected eof while reading
Failed to read 102400 bytes (0)
TLS read error: unexpected eof while reading
Failed to read 102400 bytes (0)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS read error: unexpected eof while reading
Failed to read 102400 bytes (0)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS write error: (null)
TLS read error: unexpected eof while reading
Failed to read 102400 bytes (0)
21402 files 110% [==========================================================================================================================================================================================================>] 103.80M --.-KB/s
[Files: 21401 Bytes: 103.79M [563.36KB/s] Redirects: 41 Todo: 0 Errors: 0 ]
real 3m8.684s
user 0m4.438s
sys 0m16.109s
From a first glimpse, the write/read errors seem to indicate the the server randomly closes the connection. So the "Failed to read 102400 bytes" is a very low and generic message. The errno value in parenthesis is a bit random and misleadign here, likely because it sometimes got changed when travelling through several layers of the TLS library.
The current master of wget2 changes these messages to
Failed to read 102400 bytes (hostname='skyqueen.cc', ip=185.141.27.108, errno=104)
Failed to read 102400 bytes (hostname='skyqueen.cc', ip=185.141.27.108, errno=17)
Failed to read 102400 bytes (hostname='skyqueen.cc', ip=185.141.27.108, errno=104)
Failed to read 102400 bytes (hostname='skyqueen.cc', ip=185.141.27.108, errno=2)
...
The is a CLI command errno
that translates the numbers into strings, e.g.
$ LC_ALL=C errno 104
ECONNRESET 104 Connection reset by peer
These errors are temporary. Wget2 tries to download the affected files again. The number of errors at the end is the number of finally failed downloads, so the above errors do not count in.
We could improve the error messages further if we could rely on the errno from the underlying layers (mostly TLS API).
I still have to test with OpenSSL / WolfSSL.
The multibyte translation errors, e.g. errno 84 (Invalid or incomplete multibyte or wide character), often happen when HTML pages have a different encoding then what the server or document say. See this a spurious error that you can't do much about (except you have access to the web pages / web server yourself and want to fix issues).
I do have access to the web server & I've verified in multiple ways that everything is UTF8. I routinely check via iconv -f utf-8 -t utf-8
, LC_ALL=C.UTF-8 egrep -laxv '.*'
and other methods. But if I knew exactly what page was triggering the error I could take a closer look at it specifically.
Interestingly, I didn't see this error when testing with your site. So I can only give you some hints from remote.
Basically, use wget2 -d -o log.txt --max-threads=1 --no-http2 ...
until you think the error occurred (e.g. tail -f log.txt | grep 'Failed to transcode
from a second console.
No http2 and only 1 thread because of better readability of log.txt.
You then should see in log.txt which file caused this issue. Come back with that file and the relevant part of log.txt if something is unclear, and I'll try to help.
Tried doing this:
wget2-master -d -o log.txt --max-threads=1 --no-http2 -m -np https://skyqueen.cc/archive/71master/cracky/kareha.pl/
However, when debug is turned on, it downloads the first page and then takes forever to write the analysis of of that page into the log... in the past I let it go for hours and it was still working on logging the first file. However, with the new master build I think it's going faster than I remember it doing before (maybe not) so I'll leave it running and see if it ever actually moves past the first file.
Hm, the first file takes 800ms to analyse/parse here, then you'll see the GET for the second file ... etc.
My machine is at least 3 years old (AMD Ryzen).
I'm downloading on Ubuntu WSL so that could be complicating things; I'll try it from pure Ubuntu later because WSL sometimes makes weird stuff happen
is it even supposed to be logging stuff like this?
31.125119.729 tr/@class=odd
31.125119.798 td/@class=indexcolicon
31.125119.870 a/@href=1301806368/
31.125119.938 img/@src=/icons/folder.gif
31.125120.010 img/@alt=[DIR]
31.125120.082 td/@class=indexcolname
31.125120.153 a/@href=1301806368/
31.125120.225 ='1301806368/'
31.125120.297 td/@class=indexcollastmod
31.125120.365 ='2022-08-24 17:16 '
31.125120.437 td/@class=indexcolsize
31.125120.507 =' - '
31.125120.578 ='
it's been about half an hour and the log is only up to 1MB. I don't know why it's going so slow... it's SSD and disk utilization is extremely low so I don't think that's the bottleneck, also it's only using about 0.4% CPU
Uh weird. I am pretty sure this is a WSL I/O issue then. And yes, the output looks a bit weird, it's basically the tokens from HTML parsing.
You could try to output to the console and redirect into a file. Maybe it is faster.
if I run with -v instead of -d it runs & logs at normal speed... if it's a file I/O issue I'm not sure why it would only happen with -d and not -v
will try the console thing
Just from the verbose output, I think I might see what's going on with the "Failed to transcode" errors
I think it's seeing external links containing non-ASCII characters and then outputting errors for those links
Adding URL: http://karlomongaya.wordpress.com/2009/10/22/for-the-love-of-zizek-a-fan’s-confession/
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
URL 'http://karlomongaya.wordpress.com/2009/10/22/for-the-love-of-zizek-a-fan’s-confession/' not followed (no host-spanning requested)
Adding URL: http://tanasinn.info/wiki/⊂二二二( ^ω^)二二二⊃
Failed to transcode 'utf-8' string into 'ANSI_X3.4-1968' (84)
URL 'http://tanasinn.info/wiki/⊂二二二( ^ω^)二二二⊃' not followed (no host-spanning requested)
as I understand it ANSI_X3.4-1968 is basically ASCII... so it's trying to convert from UTF8 to ASCII for some reason, and generating an error because the URLs contain characters that can't be represented in ASCII?
but these are external links and as you can see, host-spanning is turned off.... so why is it even outputting errors about external links when host-spanning is turned off?
Good finding and good question :-)
We should likely optimize the arrangement of checks (e.g. the host-spanning check). You are right when saying, there is no need to convert it here. Btw, the parsed URL is converted into a local filename and wget2 detected 'ANSI_X3.4-1968' (official name of ASCII) as your local encoding.
Pushed a commit where we do some checks earlier to avoid the above situation (and error). This should also reduce memory consumption for recursive downloads (depends on the number of external links, though).
if it's a file I/O issue I'm not sure why it would only happen with -d and not -v
-d outputs 1000x more lines (take that number with a grain of salt ;-)).
Maybe there is an fsync()
with every line written - I currently have no other idea.
I started a -d
98 minutes ago and it's only logged 70676 lines (721 lines per minute, still processing the first file) and started an otherwise identical -v
73 minutes ago which has logged 304395 lines (4170 lines per minute), and would have been much higher if I were using HTTP2, probably would have finished in about 10 minutes. So something strange is happening specifically with -d
I'll try it from some other systems later to try to narrow down if it's a WSL quirk or not
Setting aside the debug thing for now and going back to the error messages specifically... I now have a much better understanding of why the error messages are happening and what they mean although I still think they could be a lot friendlier
for the "failed to transcode" errors, is there any possibility that the URL could be displayed as part of the error? otherwise it's very difficult to know what's going on without employing verbose or debug
for the "failed to read" messages, again, the URL would be helpful, but also, the error numbers don't mean much on their own -- I had no idea about using the "errno" command to lookup the meaning of the error and I doubt most people do either. Is there a reason that the expanded text like "EPIPE 32 Broken pipe" couldn't be included as part of the error rather than just "errno=32"?