idutils icon indicating copy to clipboard operation
idutils copied to clipboard

Valid ARKs are not properly recognized by idutils

Open huberrob opened this issue 3 years ago • 0 comments

Package version (if known): idutils 1.1.11

Describe the bug

ARKs are not always recognized in particular when https is used as scheme and in case the new form of the ARK Label Part is used "ark:"instead of the old form, "ark:/". Further, https arks are not recognized

Steps to Reproduce

According to https://datatracker.ietf.org/doc/html/draft-kunze-ark-28#section-2.3 Valid ARKs are following the syntax https://NMA/]ark:[/]NAAN/Name[Qualifiers]

Therefore the following ARKS are valid but not recognized by idutils

  • ark:12345/x54xz321
  • https://test.com/ark:/12345/x54xz321
  • https://test.com/ark:12345/x54xz321

Expected behavior

The ARKs listed above should be recognized

Screenshots (if applicable)

Additional context

A quick solution would be to change the regex:

ark_suffix_regexp = re.compile(r"ark:/?[0-9bcdfghjkmnpqrstvwxz]+/.+$")

and the is_ark method like this:

def is_ark(val):
    """Test if argument is an ARK."""
    res = urlparse(val)
    return ark_suffix_regexp.match(val) or (
        res.scheme in ['http','https'] and
        res.netloc != '' and
        # Note res.path includes leading slash, hence [1:] to use same reexp
        ark_suffix_regexp.match(res.path[1:]) and
        res.params == ''
    )

huberrob avatar Feb 16 '22 12:02 huberrob