Align digest-value grammar with base16/32/64 alphabets
1.0 and 1.1 specify
labelled-digest = algorithm ":" digest-value
and digest-value is a token. "/" and "=" are not valid characters for a token. "/" is in the usual base64 encoding, and "=" is commonly used for padding.
Good catch. While the examples and most implementations use base32 (which doesn't include "/") the padding character for base32 is also "=" so it's indeed a problem there too.
@wumpus, so that we can turn this issue into a change proposal for WARC 1.2 is there a better definition for digest-value you'd like to propose?
https://tools.ietf.org/html/rfc4648 is kind of hand-waving but the union of all of the recommended schemes is
A-Za-z0-9/+-_=
Percent encoding is mentioned once and ~. are mentioned but are argued against, so it's not clear if they are allowed or not. It's as if the RFC was written to be non-normative.
This is also a 1.0/1.1 erratum, not just a proposal for the future.
This issue should be labeled with the "WARC/1.1-possible-errata" label @ato
Ah yes, good point
Given the issue noted in issue #80 with determining how is the digest encoded, shouldn't the specification be changed into something like labelled-digest = algorithm ":" encoding ":" digest-value? With suitable definitions for algorithm and encoding?