purl-spec
purl-spec copied to clipboard
Keep Hugging Face PURL version component case as-is
Update test data so version component is not lower-cased
I believe this will cause implementations that respect the following rule in the PURL-TYPES spec to fail:
The version is the model revision Git commit hash. It is case insensitive and must be lowercased in the package URL.
edit: Or hm I might've spoke too soon, the above probably only applies to the canonical_url? The version component itself is parsed as-is?
I am not sure: I am not clear whether the version JSON field is about the canonical URL or the original URL. The test could go either way.
FWIW this breaks the test case in an implementation I wrote. I'm not implying that my implementation is correct (it's merely my interpretation of this particular part of the spec). Thought I'd share!
@maitre-matt I don't understand the reason you want the alpha characters in the huggingface version value to be changed from lowercase to uppercase -- the type definition says
The version is the model revision Git commit hash. It is case insensitive and must be lowercased in the package URL.
This appears to be language you added via PR 201.
Forgot to add the link to the type definition: https://github.com/package-url/purl-spec/blob/main/PURL-TYPES.rst#huggingface.
@johnmhoran: in PR #201 there was a report that the lower-case string in the version JSON field was breaking Go and Python implementations at the time: https://github.com/package-url/purl-spec/pull/201/files#r1061825523
This PR keeps the version segment in canonical_purl lower-case, matching the part of the spec quoted above.
Whether the version in canonical_purl and the string version should match, I am not sure. Above it was mentioned this PR was breaking other implementations. Given this PR has been opened for 2+ years, it's probably a safe bet it can be discarded.
@maitre-matt Thank you for your reply -- now I see your point.
It's not clear to me -- from the spec and examples and test objects -- how all the individual values are meant to be calculated once one has the test input PURL. For me this is a bit hazy in various situations including when "is_invalid" is set to true. See, e.g., https://github.com/package-url/purl-spec/issues/419.
And I think this section needs clarification with details and some examples: https://github.com/package-url/purl-spec/blob/main/PURL-SPECIFICATION.rst#tests
Hey two things:
-
HF hashes are always lower case, so the spec looks fine to me. For instance: https://huggingface.co/CIRCL/vulnerability-severity-classification-roberta-base/commit/85a234292acc526d81ef91f74d0dc5ccfd6ac016
-
After the merge of PR #514, PURL types are now defined in JSON:
- #514
With the new approach... we have crisper definition of tests and decoded vs canonical expectations.... Here IMHO this should stay lowercase.
Thanks for your understanding and patience
See also:
- https://github.com/package-url/purl-spec/pull/534
@williamboman what's your implementation? I would love to help to implement the new test approach and spec refinements!