httrack2warc
httrack2warc copied to clipboard
Non-200 status code handling
In 3.49-2 we have:
hts-cache/new.txt:11:21:41 185/185 ---M-- 301 error ('Moved%20Permanently') text/html date:Tue,%2009%20Jan%202018%2002:21:41%20GMT http://test.example.org/redirect test.example.org/redirect (from http://test.example.org/)
Binary file hts-cache/new.zip matches
hts-ioinfo.txt:[1] request for test.example.org/redirect:
hts-ioinfo.txt:<<< GET /redirect HTTP/1.1
hts-ioinfo.txt:[1] response for test.example.org/redirect:
the new.zip comment entry has:
HTTP/1.1 301 Moved Permanently
X-In-Cache: 1
X-StatusCode: 301
X-StatusMessage: Moved Permanently
X-Size: 185
Content-Type: text/html
Last-Modified: Tue, 09 Jan 2018 02:21:41 GMT
Location: http://test.example.org/another
X-Addr: test.example.org
X-Fil: /redirect
X-Save: test.example.org/redirect
these are converted ok if hts-ioinfo is present. But without hts-ioinfo currently a resource record is created.
I don't think a cache entry is present at all in early versions of HTTrack. It might be possible to recreate redirects from the log messages though.