ArchiveBot icon indicating copy to clipboard operation
ArchiveBot copied to clipboard

Enable HTTP compression

Open JustAnotherArchivist opened this issue 4 years ago • 0 comments

AB currently doesn't make use of wpull's --http-compression option, so it doesn't send an Accept-Encoding header. Occasionally, there are websites which hate that. For example, https://www.cresta-awards.com/ sends an empty response body when compression isn't enabled, and https://www-ssrl.slac.stanford.edu/~swebb/ simply kills the connection.

Since browsers seem to send Accept-Encoding: gzip, deflate (or possibly brotli too these days) on all requests, it should probably be safe to enable this globally. It might cause a very small increase in WARC size because web servers are unlikely to always compress data at the highest compression level (as wpull does for writing WARCs), and working with compressed data inside compressed WARCs is slightly annoying, but those are just minor, irrelevant downsides.

JustAnotherArchivist avatar Sep 28 '21 05:09 JustAnotherArchivist