WARC revision 1.1 (modification): support of HTTP 2.X protocol in WARC format.
Definition: nothing is said on the HTTP 2 protocol, which could give the impression that WARC files cannot harvest documents in HTTP2.
Decision: few sentences on the handling of HTTP 2.X protocol should be written.
Action: Kristinn Sigurðsson to propose a formulation and give an example.
Following IIPC recommendations and discussions during the ISO working group meeting on November 16-17, 2015: the topic is not mature enough, so the issue is out of the 1.1 revision.
To clarify, the following encoding is acceptable for http2 requests?
$ curl --http2 -i https://www.google.com
HTTP/2 302
cache-control: private
content-type: text/html; charset=UTF-8
referrer-policy: no-referrer
location: https://www.google.com.au/?gfe_rd=cr&ei=wZmFWefNBuTDXsSYgegK
content-length: 261
date: Sat, 05 Aug 2017 10:11:13 GMT
alt-svc: quic=":443"; ma=2592000; v="39,38,37,36,35"
There is currently no standard approach for writing http/2 data into WARCs. Since http/2 headers would be binary, there should be a way to specifying this (perhaps WARC header?). But, since this is a text conversion of http/2 data to http/1.1 syntax, it should probably be written as http/1.1 to ensure compatibility.
Put another way, many tools that check for HTTP/1.0 or HTTP/1.1 might not be able to read this record, because it starts with http/2, but that data is actually http/1.1 compatible -- so they should otherwise be able to read it.
My recommendation would be to store it as HTTP/1.1 for compatibility, and perhaps add a WARC header, like WARC-Original-Protocol: HTTP/2.0 or something like that to indicate that it was originally HTTP/2
Relevant rfc for reference
- HTTP/2 https://tools.ietf.org/html/rfc7540
- HPACK https://tools.ietf.org/html/rfc7541
Semantically http/2 requests are still human readable (w/ HPACK compression) but the syntax has changed Server push seems rather nuanced
Avoiding http/2 altogether seems like the best option for now; Performance isn't worth a project-specific warc variation
This point has not been solved in WARC 1.1 revision.