CDX-Writer icon indicating copy to clipboard operation
CDX-Writer copied to clipboard

Field names should be case-insensitive

Open chfoo opened this issue 11 years ago • 1 comments

Also reported here: https://bitbucket.org/rajbot/warc-tools/issue/1 . I'm creating this issue here on GitHub so others may know about this issue as well.

According to the WARC ISO 28500 Version 1 Latest Draft, Section 4, fields names should be case-insensitive. i.e., Warc-Type should be the same as WARC-Type. Without case-insensitivity, record.type will return None if it doesn't match WARC-Type exactly for example.

chfoo avatar Feb 02 '14 01:02 chfoo

Oh, I'd like to add that this is an issue because my WARC file failed to derive proper CDX files on Internet Archive: https://archive.org/details/delcampe_20140126 .

chfoo avatar Feb 02 '14 02:02 chfoo