Nicholas Clarke

Results 9 comments of Nicholas Clarke

JWAT-1.1.3 is being released to maven central as I write this. It includes the pull request from maeb.

Well those 110 errors in jwat-tools are because the -l (relaxed uri) is not use by default. And presumable relaxed uri validation is default in the jhove module.

As for the digest. It is not computed correctly since one of the digest values is the digest of an empty string/bytearray. http://craiccomputing.blogspot.com/2009/09/sha1-digest-of-empty-string.html

https://github.com/nclarkekb/antiaction-common-datastructures I rewrote my caching flatfile lookup some weeks ago. I extracted it from the original project and placed it in this repository. (Will probaby merge it with some other...

JWAT and https://github.com/nlnwa/warchaeology also mark these digests as incorrect. I though the purpose of these digests were to validate the content as it is in the payload. It makes no...

One or more of the RFC's refered to by the WARC std. encourages the use of headers that are not too long. Also LWS should be supported by WARC readers/writers...

I do not like the idea of open ended headers that can be potentially large. Especially when there is no length info available at all. (We use the metadata record...

To avoid locking anyone into using a format they do not like. The diff record should probably include the same king of content-type information used to record http request/response payloads....

Actually the IA warc readers used in NAS use the JWAT HttpPayload parser/wrapper for exactly that reason.