warcreate
warcreate copied to clipboard
WARC Request Record payloads are missing the 'host' header
Likely critical but might not be available via Chrome's webRequest API.
Heritrix 3.2.0
WARC/1.0
WARC-Type: request
WARC-Target-URI: http://matkelly.com/
WARC-Date: 2015-12-11T13:25:07Z
WARC-Concurrent-To: <urn:uuid:29dfecaf-9cb8-4c13-b8cb-0f2e18de4310>
WARC-Record-ID: <urn:uuid:e5bfbf0b-37e8-4cfb-a32f-dd333bd474f3>
Content-Type: application/http; msgtype=request
Content-Length: 207
GET / HTTP/1.0
User-Agent: Mozilla/5.0 (compatible; heritrix/3.2.0 +http://yourdomain.com)
Connection: close
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Host: matkelly.com
WARCreate 0.2015.8.25
WARC/1.0
WARC-Type: request
WARC-Target-URI: http://matkelly.com/
WARC-Date: 2015-12-11T13:21:35Z
WARC-Concurrent-To: <urn:uuid:e9480009-ba0c-392a-3f3b-5d1487fdb651>
WARC-Record-ID: <urn:uuid:a237efca-716c-8660-1c74-16d0b5341a9e>
Content-Type: application/http; msgtype=request
Content-Length: 349
GET / HTTP/1.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8,de-DE;q=0.6
This issue remains, @N0taN3rd , despite #93. The Host header is still not present in Request record payloads.
@machawk1 per twitter discussion via Ed Summer's reply and discovery the Chrome API is adding the status to the headers.
I did not see any host headers in the Request record both when adding debugging output and searching via grep :arrow_down:

I believe RFC7230§3.2 would help in this and or blame Google so :closed_book: and :shipit: