cwltool icon indicating copy to clipboard operation
cwltool copied to clipboard

Update namespace in urn

Open MBueschelberger opened this issue 2 years ago • 7 comments

Related to #1806, the URN-double colon is updated to a single colon in order to be fully compliant to the urn-scheme

MBueschelberger avatar Feb 14 '23 13:02 MBueschelberger

Codecov Report

Merging #1808 (3d1797f) into main (c69221b) will increase coverage by 0.06%. The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1808      +/-   ##
==========================================
+ Coverage   83.55%   83.62%   +0.06%     
==========================================
  Files          44       44              
  Lines        8102     8102              
  Branches     2218     2218              
==========================================
+ Hits         6770     6775       +5     
+ Misses        851      847       -4     
+ Partials      481      480       -1     
Impacted Files Coverage Δ
cwltool/provenance.py 81.58% <ø> (ø)
cwltool/job.py 81.88% <0.00%> (+0.39%) :arrow_up:
cwltool/workflow_job.py 87.42% <0.00%> (+0.56%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Feb 14 '23 14:02 codecov[bot]

If we're using the urn:hash scheme then we should follow its recommendations, which in this case has a :: as the media type is unknown. Unfortunately his is stuck in 'draft' stage (now expired).

It is as far as I can tell however compatible with the older RFC2141 you linked to.

<URN> ::= "urn:" <NID> ":" <NSS>
<NID>         ::= <let-num> [ 1,31<let-num-hyp> ]
<NSS>         ::= 1*<URN chars>

Substituting the example urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL we get:

NID = "hash"
NSS = ":sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL"

Now the question is if ":" is permitted in <URN chars>:

<URN chars>   ::= <trans> | "%" <hex> <hex>
<trans>       ::= <upper> | <lower> | <number> | <other> | <reserved>
<other>       ::= "(" | ")" | "+" | "," | "-" | "." |
                     ":" | "=" | "@" | ";" | "$" |
                     "_" | "!" | "*" | "'"

And there it is in second line of <other>.

Now in newer RFC8141 the rules are different:

assigned-name = "urn" ":" NID ":" NSS
NID           = (alphanum) 0*30(ldh) (alphanum)
ldh           = alphanum / "-"
NSS           = pchar *(pchar / "/")

We assign NID and NSS the same, and hash is still valid NID. Now the question is if the NSS can start with : from pchar?

      pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

So ":" is declared directly, so it should also be valid.

Therefore I don't see the need for this pull request as it is breaking the namespace which was already valid before, if a bit odd-looking (but then so are all the URNs! :-) )

The real fix should be to get rid of insecure SHA1 all together and use sha256 or better, which then can use the newer Naming things with hashes scheme (RFC 6920), which deliberately do not support SHA1 and has a registry of modern checksum algorithms.

stain avatar Jun 14 '23 08:06 stain

Reading the document you linked in the issue linked here, I would expect :sha1, so no idea why we have the ::sha1, nor why this document above also uses the same two colons. Strange.

You are thinking of the separate urn:sha1 scheme which is not well defined.

stain avatar Jun 14 '23 08:06 stain

@MBueschelberger Can you help us by sharing your motivation for this change?

mr-c avatar Jun 14 '23 10:06 mr-c

Hi @mr-c,

I actually requested this change since I was not able to parse the graph with any of the available RDFLib-versions. Removing the second colon actually solved the problem for me.

This is why I actually initiated this discussion here about the origin of this ::.

Best regards

MBueschelberger avatar Sep 28 '23 09:09 MBueschelberger

@simleo Do you have experience using rdflib to parse CWL prov aggregates that contain urn:hash::sha1 (note the double colon)?

https://github.com/common-workflow-language/cwltool/pull/1808/files#diff-2bb87b99f4b0d10faa69f7bcec12f404bc4a41ac1aae63f8a14d6460fa5798a5L520

mr-c avatar Sep 28 '23 09:09 mr-c

@mr-c no, sorry. In runcrate, I use cwlprov-py to read CWLProv ROs, but I don't use rdflib directly.

simleo avatar Sep 28 '23 13:09 simleo