idutils
idutils copied to clipboard
Valid ARKs are not properly recognized by idutils
Package version (if known): idutils 1.1.11
Describe the bug
ARKs are not always recognized in particular when https is used as scheme and in case the new form of the ARK Label Part is used "ark:"instead of the old form, "ark:/". Further, https arks are not recognized
Steps to Reproduce
According to https://datatracker.ietf.org/doc/html/draft-kunze-ark-28#section-2.3
Valid ARKs are following the syntax https://NMA/]ark:[/]NAAN/Name[Qualifiers]
Therefore the following ARKS are valid but not recognized by idutils
- ark:12345/x54xz321
- https://test.com/ark:/12345/x54xz321
- https://test.com/ark:12345/x54xz321
Expected behavior
The ARKs listed above should be recognized
Screenshots (if applicable)
Additional context
A quick solution would be to change the regex:
ark_suffix_regexp = re.compile(r"ark:/?[0-9bcdfghjkmnpqrstvwxz]+/.+$")
and the is_ark method like this:
def is_ark(val):
"""Test if argument is an ARK."""
res = urlparse(val)
return ark_suffix_regexp.match(val) or (
res.scheme in ['http','https'] and
res.netloc != '' and
# Note res.path includes leading slash, hence [1:] to use same reexp
ark_suffix_regexp.match(res.path[1:]) and
res.params == ''
)