packageurl-python icon indicating copy to clipboard operation
packageurl-python copied to clipboard

PURL Qualifiers are encoded twice

Open mciccarone opened this issue 1 month ago • 1 comments

I have a script that extracts parameters from a DLL such as the Author and Product name and I have identified a case where the attributes are encoded twice within the PackageURL. I then use these attributes to create a PURL that can be used as a decoded UTF-8 string.

Here are some values that can be used to reproduce the issue. Using file attributes from DotNetNuke.DLL as an example.

The 'dll' contains the following methods:

def get_product_name():
    product_name = "https://dnncommunity.org" # This is the unencoded value
    return urllib.parse.quote(product_name, safe='') # This is the encoded value "https%%3A%2F%2Fdnncommunity.org" 

def get_author():
    author = ".NET Foundation" # This is the unencoded value
    return urllib.parse.quote(author, safe='') # This is the encoded value ".NET%20Foundation"

I need to combine the content in a forward slash ('/' separated format so that Nexus can understand it. e.g. /<product_name>

purlattrs = f'{dll.get_author()}%2F{dll.get_product_name()}'
print(purlattrs) # output = '.NET%20Foundation%2Fhttps%3A%2F%2Fdnncommunity.org'
# This is the correctly encoded URL safe string

_qualifiers = {'Attr1':purlattrs), 'Attr2':'Foo'}
purl = PackageURL(type='generic', name="DotNetNuke.dll", version="9.11.0.46", qualifiers=_qualifiers)
print(purl)

The purl that is printed is "pkg:generic/[email protected]?Attr1=.NET%2520Foundation%252Fhttps%253A%252F%252Fdnncommunity.org&Attr2=Foo"

  • As you can see, the Space characters are encoded now as %2520
  • The Forward Slash is now %252F instead of %2F
  • The colon is now %253A instead of %3A.
  • The % Character is being encoded to %25.

If I pass in the raw string value to the PackageURL like below:

purlattrs = f".Net Foundation/https://dnncommunity.org"
print(purlattrs) # output = ".Net Foundation/https://dnncommunity.org"

I get the following output from print() when I pass in the raw string value. "pkg:generic/[email protected]?Attr1=.NET%20Foundation/https://dnncommunity.org&Attr2=Foo"

  • In this scenario the Encoding works for the Space, but does not work for the Slashes or Colon.

  • Recommend changing the behavior of the PURL encoding to urllib.parse.quote and url.parse.unquote ,or eliminating the encoding portion and having the PackageURL user perform the encoding/decoding.

mciccarone avatar Dec 03 '25 02:12 mciccarone