warc-specifications icon indicating copy to clipboard operation
warc-specifications copied to clipboard

WARC revision 1.1 (augmentation): specification of the WAT format

Open cleymour opened this issue 10 years ago • 2 comments

Definition: WAT (Web Archive Transformation) is a “profile” of WARC format intended to store web archive metadata, notably for data mining processes (seehttps://webarchive.jira.com/wiki/display/Iresearch/Web+Archive+Transformation+%28WAT%29+Specification,+Utilities,+and+Usage+Overview). Proposing the WAT format as an official (even though not prescriptive) specification of the WARC format would give it more authority and would allow more confidence in its maintenance.

Decision: Propose a specification of the WAT format as an (informative) appendix of the standard. Ensure that the specification may be freely available online.

Action: Vinay Goel from Internet Archive to propose a specification, with comments from WAT users such as Andy Jackson from BL or Sara Aubry from BnF. Find a place to host the WAT specification (e.g. on BnF website, as traditional host of the WARC standard draft?).

cleymour avatar Jun 16 '15 14:06 cleymour

Following discussions during the ISO working group meeting on November 16-17, 2015: the topic does not seem mature enough, so the issue is out of the 1.1 revision.

saraaubry avatar Nov 17 '15 10:11 saraaubry

This point has not been solved in WARC 1.1 revision.

saraaubry avatar Dec 07 '17 10:12 saraaubry