openwayback icon indicating copy to clipboard operation
openwayback copied to clipboard

Consider adding status code and digest to Memento TimeMaps

Open anjackson opened this issue 8 years ago • 2 comments

Following this conversation, we should consider putting the HTTP status code and perhaps also the payload digest (if known) in the Memento TimeMap. e.g.

...
<https://www.webarchive.org.uk/wayback/archive/20151106002344/http://www.bl.uk/>; rel="memento"; datetime="Fri, 06 Nov 2015 00:23:44 GMT",
<https://www.webarchive.org.uk/wayback/archive/20151106004051/http://www.bl.uk/>; rel="memento"; datetime="Fri, 06 Nov 2015 00:40:51 GMT",
...

could be something like...

...
<https://www.webarchive.org.uk/wayback/archive/20151106002344/http://www.bl.uk/>; rel="memento"; datetime="Fri, 06 Nov 2015 00:23:44 GMT"; status="404",
<https://www.webarchive.org.uk/wayback/archive/20151106004051/http://www.bl.uk/>; rel="memento"; datetime="Fri, 06 Nov 2015 00:40:51 GMT"; status="200",
...

This information is generally in the CDX index/service so it should be easy enough to add.

Are there any downsides?

EDIT: I've just realised one possible source of issues. The time-map return the status code from the CDX, i.e. from the original server, but our service can override that to return a 451 status. In our case, this doesn't really matter because we work at URI-resolution, so the whole timemap 451, but if anyone blocks individual instances of a resource this will lead to problems. Not sure anyone does that though?

anjackson avatar Apr 28 '17 12:04 anjackson

Revisit records do not include a status code in the CDX. Usually they represent a 200, but there are cases of deduplication of 301, 302 and 404s.

The actual status code is in the WARC file of course, but it is more expensive to fetch. Maybe this is really a case where the CDX should be fixed.

kris-sigur avatar Apr 28 '17 13:04 kris-sigur

For the original use case, it might be sufficient to just omit the status code unless we're sure of it.

anjackson avatar Apr 28 '17 16:04 anjackson