openwayback
openwayback copied to clipboard
Document Wayback's supported CDX format variations
The base CDX Format allows a number of variations, and over the years, Wayback has supported a range of these variants. The idea here is to try to document the main variations so that users know the issues and whether they may have to re-index.
I attempted to do this here: http://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/
However, it would probably be better to see if we can work with IA to update https://archive.org/web/researcher/cdx_file_format.php