wget2
wget2 copied to clipboard
Cannot mirror some sites that can be mirrored using wget
I tried to mirror https://html.spec.whatwg.org using command wget2 -mkpnp https://html.spec.whatwg.org
. Only index.html and robots.txt gets downloaded. No problem when mirroring with wget
Hi,
you might use a very old version of wget2. The current one (2.0.0 from git master) creates (with your command line)
html.spec.whatwg.org/
├── demos
│ ├── canvas
│ │ └── blue-robot
│ │ └── index-idle.html
│ └── workers
│ ├── crypto
│ │ └── page.html
│ ├── modules
│ │ └── page.html
│ ├── multicore
│ │ └── page.html
│ ├── multiviewer
│ │ └── page.html
│ ├── primes
│ │ └── page.html
│ └── shared
│ ├── 001
│ │ └── test.html
│ ├── 002
│ │ └── test.html
│ └── 003
│ ├── inner.html
│ └── test.html
├── dev
│ ├── acknowledgements.html
│ ├── browsers.html
│ ├── browsing-the-web.html
│ ├── canvas.html
│ ├── common-dom-interfaces.html
│ ├── common-microsyntaxes.html
│ ├── comms.html
│ ├── custom-elements.html
│ ├── dnd.html
│ ├── dom.html
│ ├── droidserif-bolditalic.woff2
│ ├── droidserif-bold.woff2
│ ├── droidserif-italic.woff2
│ ├── droidserif.woff2
│ ├── dynamic-markup-insertion.html
│ ├── edits.html
│ ├── embedded-content.html
│ ├── embedded-content-other.html
│ ├── form-control-infrastructure.html
│ ├── form-elements.html
│ ├── forms.html
│ ├── grouping-content.html
│ ├── history.html
│ ├── iframe-embed-object.html
│ ├── imagebitmap-and-animations.html
│ ├── image-maps.html
│ ├── images.html
│ ├── index.html
│ ├── indices.html
│ ├── infrastructure.html
│ ├── input.html
│ ├── interaction.html
│ ├── interactive-elements.html
│ ├── introduction.html
│ ├── links.html
│ ├── media.html
│ ├── microdata.html
│ ├── mplus-2p-heavy.woff
│ ├── named-characters.html
│ ├── obsolete.html
│ ├── origin.html
│ ├── references.html
│ ├── scripting.html
│ ├── search.js
│ ├── sections.html
│ ├── semantics.html
│ ├── semantics-other.html
│ ├── server-sent-events.html
│ ├── structured-data.html
│ ├── styles.css
│ ├── syntax.html
│ ├── system-state.html
│ ├── tables.html
│ ├── text-level-semantics.html
│ ├── timers-and-user-prompts.html
│ ├── urls-and-fetching.html
│ ├── webappapis.html
│ ├── web-messaging.html
│ ├── webstorage.html
│ ├── window-object.html
│ ├── workers.html
│ ├── worklets.html
│ └── xhtml.html
├── entities.json
├── fonts
│ ├── Essays1743-BoldItalic.ttf
│ ├── Essays1743-Bold.ttf
│ ├── Essays1743-Italic.ttf
│ └── Essays1743.ttf
├── html-dfn.js
├── images
│ ├── arc1.png
│ ├── arcTo1.png
│ ├── arcTo2.png
│ ├── arcTo3.png
│ ├── asyncdefer.svg
│ ├── baselines.png
│ ├── bidiselect.png
│ ├── content-venn.svg
│ ├── custom-element-reactions.svg
│ ├── drawImage.png
│ ├── focus-tree.png
│ ├── im.png
│ ├── ircfog-modules.svg
│ ├── outline.svg
│ ├── parsing-model-overview.svg
│ ├── premultiplied-example-1.png
│ ├── premultiplied-example-2.png
│ ├── premultiplied-example-3.png
│ ├── premultiplied-example-4.png
│ ├── premultiplied-example-5.png
│ ├── sample-bdi.png
│ ├── sample-datalist.svg
│ ├── sample-details-1.png
│ ├── sample-details-2.png
│ ├── sample-email-1.svg
│ ├── sample-email-2.svg
│ ├── sample-meter.png
│ ├── sample-not-bdi.png
│ ├── sample-progress.png
│ ├── sample-range-2a.png
│ ├── sample-range-2b.png
│ ├── sample-range-labels.png
│ ├── sample-range.png
│ ├── sample-ruby-bopomofo.png
│ ├── sample-ruby-ja.png
│ ├── sample-ruby-pinyin.png
│ ├── sample-url.svg
│ ├── sample-usemap.png
│ ├── select-country-1.png
│ ├── select-country-2.png
│ └── table-scope-diagram.png
├── index.html
├── link-fixup.js
├── multipage
│ ├── acknowledgements.html
│ ├── browsers.html
│ ├── browsing-the-web.html
│ ├── canvas.html
│ ├── common-dom-interfaces.html
│ ├── common-microsyntaxes.html
│ ├── comms.html
│ ├── custom-elements.html
│ ├── dnd.html
│ ├── dom.html
│ ├── dynamic-markup-insertion.html
│ ├── edits.html
│ ├── embedded-content.html
│ ├── embedded-content-other.html
│ ├── form-control-infrastructure.html
│ ├── form-elements.html
│ ├── forms.html
│ ├── grouping-content.html
│ ├── history.html
│ ├── iana.html
│ ├── iframe-embed-object.html
│ ├── imagebitmap-and-animations.html
│ ├── image-maps.html
│ ├── images.html
│ ├── index.html
│ ├── indices.html
│ ├── infrastructure.html
│ ├── input.html
│ ├── interaction.html
│ ├── interactive-elements.html
│ ├── introduction.html
│ ├── links.html
│ ├── media.html
│ ├── microdata.html
│ ├── named-characters.html
│ ├── obsolete.html
│ ├── origin.html
│ ├── parsing.html
│ ├── references.html
│ ├── rendering.html
│ ├── scripting.html
│ ├── sections.html
│ ├── semantics.html
│ ├── semantics-other.html
│ ├── server-sent-events.html
│ ├── structured-data.html
│ ├── syntax.html
│ ├── system-state.html
│ ├── tables.html
│ ├── text-level-semantics.html
│ ├── timers-and-user-prompts.html
│ ├── urls-and-fetching.html
│ ├── webappapis.html
│ ├── web-messaging.html
│ ├── webstorage.html
│ ├── window-object.html
│ ├── workers.html
│ ├── worklets.html
│ └── xhtml.html
├── print.pdf
├── robots.txt
└── styles.css
@rockdaboot OK, It's working after updating. When I press Ctrl C
the file being downloaded gets removed. How can I quit wget2
without removing the file being downloaded so that I can resume download later?
How can I quit wget2 without removing the file being downloaded so that I can resume download later?
Currently, wget2 uses a 10MB buffer (per file/thread) that is lost on CTRL-c. Not sure if this is a real "problem", but it surely depends on your network speed. To flush the buffers on CTRL-c, these need to become accessible from outside the thread. This needs some architectural changes that I'd try to avoid. Another option is to reduce the buffer size, either with a CLI option or automatically depending on the network (or better: data flow) speed.
I leave this issue open as a reminder.