packageurl-python icon indicating copy to clipboard operation
packageurl-python copied to clipboard

Why is the colon in the name not translated?

Open lwlwudi opened this issue 2 years ago • 8 comments

Why is the colon in the name not translated?

lwlwudi avatar Nov 07 '22 11:11 lwlwudi

like this pkg:generic/SuSE%20Linux%20Enterprise%20Server%2012%20SP5:[email protected]%2Bgit20140911.61c1681-38.13.1.x86_64

lwlwudi avatar Nov 07 '22 11:11 lwlwudi

It seems to have been translated in the Java package.

lwlwudi avatar Nov 08 '22 02:11 lwlwudi

Is it a colon that doesn't need to be translated anywhere?

lwlwudi avatar Nov 08 '22 02:11 lwlwudi

It seems there is a unnecessary step for encoding, it is against spec, is it a bug?

def quote(s):
    """
    Return a percent-encoded unicode string, except for colon :, given an `s`
    byte or unicode string.
    """
    if isinstance(s, unicode):
        s = s.encode('utf-8')
    quoted = _percent_quote(s)
    if not isinstance(quoted, unicode):
        quoted = quoted.decode('utf-8')
    quoted = quoted.replace('%3A', ':')              # there is unnecessary by spec
    return quoted

shiqi0715 avatar Nov 08 '22 08:11 shiqi0715

This is not a bug. The spec does not say to escape ':', and the test suite gives examples of not escaping ':'.

matt-phylum avatar Nov 08 '23 17:11 matt-phylum

What about a use case where the is url with port as part of the name..

For example pkg:container/index.myregstry.io:5000/my-image@v1

Would you expect the ':' to be encoded when going in to toString func? I was thinking it should not be encoded but it seems to fail on urlparse check.

It seems to recognize the 'index.myregistry.io' as a url scheme and fail with a a bit misleading error.

https://github.com/package-url/packageurl-python/blob/0d3336804ce6dac59975c8c170d4015149442a93/src/packageurl/init.py#L507

If you prefer I can open a seperate issue.

I can prepare a pr depending on the discussion.

houdini91 avatar Dec 05 '23 10:12 houdini91

Ok after taking previous advice I see you expect Such a purl to be pkg:docker/my_image@sha256:244fd47e07d1004f0aed9c?repository_url=index.my-regstory.io:500

Did I understand correctly, can you elaborate on this logic or point me the the related spec, I tried it with the golang library and i did not see this limitation .

houdini91 avatar Dec 05 '23 10:12 houdini91

The spec says "the ':' scheme and type separator does not need to and must NOT be encoded. It is unambiguous unencoded elsewhere," meaning it does not need to be encoded here. However:

  1. Some PURL implementations use generic escaping methods which escape more characters than necessary. This should be okay when parsed using an parser that correctly implements the spec, but can cause problems when supposedly canonical PURLs are compared as strings and they aren't really canonical. Because of this, I'd recommend to compare PURLs using the URL algorithm where you parse and then reserialize both PURLs using whatever PURL implementation you're using to get a consistent representation.
  2. The way PURL encodes qualifiers is very similar to x-www-form-urlencoded, which encodes more characters and has special rules about '+' characters. The PURL spec does not mention x-www-form-urlencoded anywhere, but some implementations use x-www-form-urlencoded anyway, leading to unnecessary escaping and incorrect serialization and parsing of some PURLs. (https://github.com/package-url/purl-spec/pull/261)

matt-phylum avatar Dec 05 '23 13:12 matt-phylum