Alex Osborne
Alex Osborne
I don't think pywb has this mode builtin anymore. There's some [old documentation on the wiki](https://github.com/webrecorder/pywb/wiki/Pywb-Proxy-Mode-Usage#proxy-auth-selection) but as far as I can tell the code for it has been removed....
Heritrix, which is probably where the 'classic' value originally came from, currently has these policies: * `classic` (alias `obey`) * `ignore` * `robotsTxtOnly` (obeys robots.txt but ignores the robots meta...
> the same dictionary should be included at the beginning whenever it is used (so it can be parsed by standard zstd which expect a dictionary at the beginning) I...
I tested again and can still reproduce even with a fresh browser profile. First, installing Pywb fresh, and turning on client_side_replay mode on with just an empty collection: ```sh python3.11...
Your suggestion seems sensible. I've added it as a [community recommendation](http://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1-annotated/#:~:text=When%20recording%20IPv6%20addresses).
[wget calls `inet_ntop`](https://github.com/gitGNU/gnu_wget/blob/580067d1e69abda1e5403d3c897271a9b3526cdf/src/host.c#L429) which [POSIX](https://pubs.opengroup.org/onlinepubs/9799919799/functions/inet_ntop.html) seems to only require produce "a text string suitable for presentation". It looks like [glibc](https://codebrowser.dev/glibc/glibc/resolv/inet_ntop.c.html#inet_ntop6) and [musl](https://git.musl-libc.org/cgit/musl/tree/src/network/inet_ntop.c?h=v1.1.19&id=0b44a0315b47dd8eced9f3b7f31580cf14bbfc01)'s implementations would produce the canonical form. The current...
> Does the dedup standard allow for deduping across multiple independent WARC files Yes, revisit records can refer to records in other WARC files. The common way they're used is...
My goal when web archiving is to preserve web resources, not network messages. Therefore I consider translating or changing transport-level message headers for implementation practicality acceptable provided the semantics needed...
Good argument. I concede if one interprets 'should' as a strict requirement (which is most likely correct given the other usages in the spec) then indeed WARC 1.1 prohibits storing...
> Storing HTTP/2 literally in WARC records has a major challenge: with the HPACK header compression to decompress the headers of one request/response pair you need the headers of preceding...