wget2
wget2 copied to clipboard
Multithreaded download: Change server probing behavior to better match browsers
Chromium-based browsers send a GET request with Range: bytes=0-
and discard the body, instead of a HEAD request. I've seen servers that take advantage of it as a form of anti-scraping. Specifically, they disallow the use of HEAD requests, and return 404 for requests that don't have a Range
header.
I propose that we add Range: bytes=0-
to every initial GET request, and then check for 206 or the Accept-Ranges: bytes
header to determine if segmented downloading is supported.
aria2c has a similar problem, and there's a PR addressing it, which can be found here