packageurl-python
packageurl-python copied to clipboard
Why is the colon in the name not translated?
Why is the colon in the name not translated?
like this pkg:generic/SuSE%20Linux%20Enterprise%20Server%2012%20SP5:[email protected]%2Bgit20140911.61c1681-38.13.1.x86_64
It seems to have been translated in the Java package.
Is it a colon that doesn't need to be translated anywhere?
It seems there is a unnecessary step for encoding, it is against spec, is it a bug?
def quote(s):
"""
Return a percent-encoded unicode string, except for colon :, given an `s`
byte or unicode string.
"""
if isinstance(s, unicode):
s = s.encode('utf-8')
quoted = _percent_quote(s)
if not isinstance(quoted, unicode):
quoted = quoted.decode('utf-8')
quoted = quoted.replace('%3A', ':') # there is unnecessary by spec
return quoted
This is not a bug. The spec does not say to escape ':', and the test suite gives examples of not escaping ':'.
What about a use case where the is url with port as part of the name..
For example pkg:container/index.myregstry.io:5000/my-image@v1
Would you expect the ':' to be encoded when going in to toString func? I was thinking it should not be encoded but it seems to fail on urlparse check.
It seems to recognize the 'index.myregistry.io' as a url scheme and fail with a a bit misleading error.
https://github.com/package-url/packageurl-python/blob/0d3336804ce6dac59975c8c170d4015149442a93/src/packageurl/init.py#L507
If you prefer I can open a seperate issue.
I can prepare a pr depending on the discussion.
Ok after taking previous advice I see you expect Such a purl to be pkg:docker/my_image@sha256:244fd47e07d1004f0aed9c?repository_url=index.my-regstory.io:500
Did I understand correctly, can you elaborate on this logic or point me the the related spec, I tried it with the golang library and i did not see this limitation .
The spec says "the ':' scheme and type separator does not need to and must NOT be encoded. It is unambiguous unencoded elsewhere," meaning it does not need to be encoded here. However:
- Some PURL implementations use generic escaping methods which escape more characters than necessary. This should be okay when parsed using an parser that correctly implements the spec, but can cause problems when supposedly canonical PURLs are compared as strings and they aren't really canonical. Because of this, I'd recommend to compare PURLs using the URL algorithm where you parse and then reserialize both PURLs using whatever PURL implementation you're using to get a consistent representation.
- The way PURL encodes qualifiers is very similar to x-www-form-urlencoded, which encodes more characters and has special rules about '+' characters. The PURL spec does not mention x-www-form-urlencoded anywhere, but some implementations use x-www-form-urlencoded anyway, leading to unnecessary escaping and incorrect serialization and parsing of some PURLs. (https://github.com/package-url/purl-spec/pull/261)