newnode icon indicating copy to clipboard operation
newnode copied to clipboard

wget subprocess crashes on Ubuntu 22.04

Open iAnatoly opened this issue 2 years ago • 3 comments

The proxy works, but wget subprocess crashes on Ubuntu 22.04

Program terminated with signal SIGXFSZ, File size limit exceeded.

Backtrace on wget looks like this:

Core was generated by `wget https://meduza.io:443 -o /dev/null -O /var/tmp/nn1c56b72b --header=Range:'.
Program terminated with signal SIGXFSZ, File size limit exceeded.
#0  0x00007fb971ac4a37 in __GI___libc_write (fd=4, buf=0x56210a33f8f0, nbytes=2048) at ../sysdeps/unix/sysv/linux/write.c:26
26	../sysdeps/unix/sysv/linux/write.c: No such file or directory.
(gdb) bt
#0  0x00007fb971ac4a37 in __GI___libc_write (fd=4, buf=0x56210a33f8f0, nbytes=2048) at ../sysdeps/unix/sysv/linux/write.c:26
#1  0x00007fb971a3af6d in _IO_new_file_write (f=0x56210a27d8b0, data=0x56210a33f8f0, n=4096) at ./libio/fileops.c:1180
#2  0x00007fb971a3ca61 in new_do_write (to_do=4096, 
    data=0x56210a33f0f0 "1\272ܼ\261\360\234\217\002\263Ҽ^\252\213\266>\354\244w'\025\217\356l\373\213n\261\335\355\215\345t\321\366\360\335Zh+\240\364\356 {\206\363\325\334b*\231Wc\017\177Ϟs\331s\276\331/\033\310\363\016v2\343\352\320ݫ\313v7P\374P\255f̆\234/v\331\036_A\ay\311\326r-\347\315\062\373\233ŀ\277\345Zi\353\370.\177\335,\355o\273\254۵3\254", <incomplete sequence \327>, 
    fp=0x56210a27d8b0) at ./libio/libioP.h:947
#3  _IO_new_do_write (to_do=4096, 
    data=0x56210a33f0f0 "1\272ܼ\261\360\234\217\002\263Ҽ^\252\213\266>\354\244w'\025\217\356l\373\213n\261\335\355\215\345t\321\366\360\335Zh+\240\364\356 {\206\363\325\334b*\231Wc\017\177Ϟs\331s\276\331/\033\310\363\016v2\343\352\320ݫ\313v7P\374P\255f̆\234/v\331\036_A\ay\311\326r-\347\315\062\373\233ŀ\277\345Zi\353\370.\177\335,\355o\273\254۵3\254", <incomplete sequence \327>, 
    fp=0x56210a27d8b0) at ./libio/fileops.c:425
#4  _IO_new_do_write (fp=0x56210a27d8b0, 
    data=0x56210a33f0f0 "1\272ܼ\261\360\234\217\002\263Ҽ^\252\213\266>\354\244w'\025\217\356l\373\213n\261\335\355\215\345t\321\366\360\335Zh+\240\364\356 {\206\363\325\334b*\231Wc\017\177Ϟs\331s\276\331/\033\310\363\016v2\343\352\320ݫ\313v7P\374P\255f̆\234/v\331\036_A\ay\311\326r-\347\315\062\373\233ŀ\277\345Zi\353\370.\177\335,\355o\273\254۵3\254", <incomplete sequence \327>, to_do=4096)
    at ./libio/fileops.c:422
#5  0x00007fb971a3b755 in _IO_new_file_xsputn (n=7829, data=<optimized out>, f=<optimized out>) at ./libio/libioP.h:947
#6  _IO_new_file_xsputn (f=0x56210a27d8b0, data=<optimized out>, n=7829) at ./libio/fileops.c:1196
#7  0x00007fb971a30057 in __GI__IO_fwrite (buf=0x56210a34c9c0, size=1, count=7829, fp=0x56210a27d8b0) at ./libio/libioP.h:947
#8  0x0000562109ce41bf in ?? ()
#9  0x0000562109ce4c51 in ?? ()
#10 0x0000562109cce949 in ?? ()
#11 0x0000562109cd9971 in ?? ()
#12 0x0000562109cdf4c8 in ?? ()
#13 0x0000562109ceac8b in ?? ()
#14 0x0000562109cbf54f in ?? ()
#15 0x00007fb9719d9d90 in __libc_start_call_main (main=main@entry=0x562109cbe260, argc=argc@entry=11, argv=argv@entry=0x7fff48a0cf08) at ../sysdeps/nptl/libc_start_call_main.h:58
#16 0x00007fb9719d9e40 in __libc_start_main_impl (main=0x562109cbe260, argc=11, argv=0x7fff48a0cf08, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff48a0cef8)
    at ../csu/libc-start.c:392
#17 0x0000562109cc12d5 in ?? ()

iAnatoly avatar May 05 '22 05:05 iAnatoly

newnode deliberately runs wget with a low maximum file size limit as a way to limit the size of downloads, because I couldn't find a wget option to do that that worked. But wget shouldn't generate a core file when that happens, so I'll try to fix that.

The real fix is to implement this in some other way than spawning wget.

flyinhome avatar May 05 '22 09:05 flyinhome

Yes, after reviewing https_wget.c code, it does appear that you guys went on a quite a journey through a rabbit hole to make it work. Curl-multy API seems like an easier alternative at the moment.

Interestingly, limiting the file size indeed does not work - quote option does not limit download size for a single file by design (as file size is not always known in the beginning of a download), and range requests can also be ignored by http server. libcurl appears to have the same limitation (for the same reasons).

Have you considered using a timeout rather than file size as a limiting factor? Wget does support --read-timeout.

iAnatoly avatar May 06 '22 01:05 iAnatoly

On 5/5/22 21:14, Anatoly Ivanov wrote:

Yes, after reviewing https_wget.c code, it does appear that you guys went on a quite a journey through a rabbit hole to make it work.

Right.  At the time I started on that approach, wget seemed like the easiest path.   Have considered 2 or 3 alternatives, but there have always been more urgent things to implement.

Have you considered using a timeout rather than file size as a limiting factor? Wget does support --read-timeout.

We do implement a timeout, but for different reasons.

Keith

flyinhome avatar May 06 '22 01:05 flyinhome