warcreate icon indicating copy to clipboard operation
warcreate copied to clipboard

Generate WARC-Payload-Digest and WARC-Block-Digest for WARC records

Open machawk1 opened this issue 11 years ago • 2 comments

Have not yet found a way to consistently do this via JavaScript. Same data from Htrix WARCs return hex-like values from UNIX shasum but Htrix hashes have characters beyond this scope (e.g., "M"). The WARC spec says to use a 32 bit hash but I don't know how to do this.

machawk1 avatar Aug 13 '14 22:08 machawk1

https://github.com/agnoster/base32-js ?

(Imho the base32 choice is highly regrettable. Save 8 bytes on each warc record at the expense of interoperability with everybody else in the world. But I guess we're stuck now.)

nlevitt avatar Aug 13 '14 23:08 nlevitt

Thanks, @nlevitt . Would you happen to have a reference WARC with uncompressed HTML (e.g., explicit

viewable in the WARC) to verify correctness between this library and what Htrix produces?

Step 0 for WARCreate is interoperability. What is the alternative/ideal hash algorithm to use, iyho?

machawk1 avatar Aug 14 '14 13:08 machawk1