packageurl-python icon indicating copy to clipboard operation
packageurl-python copied to clipboard

PackageURL not properly re-encoding strings when rendering to string

Open jkugler opened this issue 1 year ago • 3 comments

When passing in a URL encoded name to PackageURL.from_string, it de-encodes the string, which is correct to have the actual name. However, when rendering this out as a string, it does not re-encode the string, resulting in an incorrect PURL.

>>> import packageurl
>>> from urllib.parse import quote_plus
>>> quote_plus("parent/child")
'parent%2Fchild'
>>> p = packageurl.PackageURL.from_string(f"pkg:my_type/my_namepace/{quote_plus('parent/child')}/@1234")
>>> p
PackageURL(type='my_type', namespace='my_namepace', name='parent/child', version='1234', qualifiers={}, subpath=None)

That is correct, as the name is parent/child. However:

>>> str(p)
'pkg:my_type/my_namepace/parent/child@1234'

Which is an invalid/incorrect PURL.

The fix looks easy. This line https://github.com/package-url/packageurl-python/blob/main/src/packageurl/init.py#L458 instead of being

        purl.append(name)

looks like it should be

        purl.append(urllib.parse.quote_plus(name))

jkugler avatar Apr 25 '24 22:04 jkugler

I've been thinking about this some more, and I don't know if it's strictly a bug, or if it's spec compliant, but it does "break" in the round trip:

p = packageurl.PackageURL.from_string(f"pkg:my_type/my_namepace/{quote_plus('parent/child')}/@1234")
>>> p
PackageURL(type='my_type', namespace='my_namepace', name='parent/child', version='1234', qualifiers={}, subpath=None)
>>> str(p)
'pkg:my_type/my_namepace/parent/child@1234'
>>> p = packageurl.PackageURL.from_string(str(p))
>>> p
PackageURL(type='my_type', namespace='my_namepace/parent', name='child', version='1234', qualifiers={}, subpath=None)

Note namespace and name change, whereas if PackageURL had retained the URL encoding upon __str__ invocation, it would have retained the name of parent/child.

jkugler avatar Apr 26 '24 16:04 jkugler

Related to PR #123

matt-phylum avatar Apr 26 '24 17:04 matt-phylum

So, another related issue. Is this a bug? Or is this expected behavior?

>>> p = PackageURL.from_string('pkg:maven/com.google.guava%[email protected]')
>>> p
PackageURL(type='maven', namespace=None, name='com.google.guava:guava', version='25.1-jre', qualifiers={}, subpath=None)
>>> str(p)
'pkg:maven/com.google.guava:[email protected]'
>>> PackageURL.from_string(str(p))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    PackageURL.from_string(str(p))
  File "/opt/homebrew/lib/python3.11/site-packages/packageurl/__init__.py", line 512, in from_string
    raise ValueError(msg)
ValueError: Invalid purl 'pkg:maven/com.google.guava:[email protected]' cannot contain a "user:pass@host:port" URL Authority component: ''.

What is the proper behavior here?

jkugler avatar Apr 30 '24 00:04 jkugler