packageurl-python
packageurl-python copied to clipboard
Creation PackageURL objects for invalid PURLs / Possible improper encoding of colons (":") in PURL fields
It is possible to create PackageURL objects that contain invalid fields, specifically by using the PackageURL
kwarg constructor and passing in values that contain colons.
Simple example:
>>> from packageurl import PackageURL
>>> p = PackageURL(type="generic", name="Foo: <Bar>", version="1.2.3")
>>> p.to_string()
'pkg:generic/Foo:%20%3CBar%[email protected]'
>>> PackageURL.from_string(p.to_string())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/vossn/finitestate/finite-state-sip/venv/lib/python3.10/site-packages/packageurl/__init__.py", line 514, in from_string
raise ValueError(msg)
ValueError: Invalid purl 'pkg:generic/Foo:%20%3CBar%[email protected]' cannot contain a "user:pass@host:port" URL Authority component: ''.
On closer inspection, it looks like the problem might be that colons (:
) are not being percent-encoded correctly? I would expect the colon in the name to be encoded to %3A
, but it looks like it is being left as a literal :
in the to_string()
function:
>>> p = PackageURL(type="generic", name="Foo: <Bar>", version="1.2.3")
>>> p
PackageURL(type='generic', namespace=None, name='Foo: <Bar>', version='1.2.3', qualifiers={}, subpath=None)
>>> p.to_string()
'pkg:generic/Foo:%20%3CBar%[email protected]'
I'm not sure I'm interpreting the PURL spec correctly with regards to the treatment of colon characters, but on the surface it sounds like any colon appearing within an individual field value should simply be percent-encoded if the field itself calls for it.
Totally agree. Quote from PURL specification:
A name must be a percent-encoded string
Colons are explicitly not encoded for an unknown reason:
https://github.com/package-url/packageurl-python/blob/f98abf0f3c295873e18f968ebd00138a02d63b25/src/packageurl/init.py#L71C40-L71C40
This line was added as part of commit d7be0209d00fefd819d27804b1ee536765e6509e, but there is no explanation as to why. This applies to all fields, not just name, fwiw.
@pombredanne as the author of this particular piece of the code, can you share why colons are treated special in this context? :-)