Continue reporting/displaying PURLs with URL encoding?
This is related to https://github.com/nexB/vulnerablecode/issues/1228 and https://github.com/nexB/vulnerablecode/issues/1252.
I noticed yesterday while working on https://github.com/nexB/vulnerablecode/issues/1228 that my tests using univers (https://github.com/nexB/univers) to compare affected and fixed by versions threw a univers.versions.InvalidVersion: '2.12.1-1%2Bdeb11u1' is not a valid <class 'univers.versions.DebianVersion'> error when I included pkg:deb/debian/[email protected]%2Bdeb11u1 in the test.
I eventually figured out that the culprit was the string %2B -- the URL-encoded + that debian.org uses for this jackson-databind package. (See, e.g., https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding.) Importing urllib.parse and using the unquote() function enabled me to complete the comparison without error, and is now part of my draft tests as well:
# Test the error
with pytest.raises(versions.InvalidVersion):
assert versions.DebianVersion("2.12.1-1%2Bdeb11u1") < versions.DebianVersion(
"2.13.1-1%2Bdeb11u1"
)
# Decode the version and test.
assert versions.DebianVersion(
urllib.parse.unquote("2.12.1-1%2Bdeb11u1")
) < versions.DebianVersion(urllib.parse.unquote("2.13.1-1%2Bdeb11u1"))
My question: do we want to continue this approach, or would we prefer instead to use non-URL-encoded versions in our PURLs?
If we search in vulnerablecode.io for pkg:deb/debian/[email protected]%2Bdeb11u1 and pkg:deb/debian/[email protected]+deb11u1, we get the same results -- displayed in the UI, for example like this
pkg:deb/debian/[email protected]%2Bdeb11u1
with these respective links
https://public.vulnerablecode.io/packages/pkg:deb/debian/[email protected]%252Bdeb11u1?search=pkg:deb/debian/[email protected]%2Bdeb11u1
https://public.vulnerablecode.io/packages/pkg:deb/debian/[email protected]%252Bdeb11u1?search=pkg:deb/debian/[email protected]+deb11u1
FWIW, debian.org displays the non-encoded version 2.12.1-1+deb11u1 with an underlying link to a details page. See https://tracker.debian.org/pkg/jackson-databind.
I see that packageurl.PackageURL.from_string() will also provide us with a decoded (or non-URL-encoded) version from a PURL.
import urllib.parse
import packageurl
deb_purl = "pkg:deb/debian/[email protected]%2Bdeb11u1"
decoded_deb_purl = urllib.parse.unquote(deb_purl)
print("\ndecoded_deb_purl = {}\n".format(decoded_deb_purl))
# Test PURL
purl = packageurl.PackageURL.from_string(deb_purl)
print("\npurl = {}\n".format(purl))
print(purl.type)
print(purl.namespace)
print(purl.name)
print(purl.version)
print(purl.qualifiers)
print(purl.subpath)
produces this output:
decoded_deb_purl = pkg:deb/debian/[email protected]+deb11u1
purl = pkg:deb/debian/[email protected]%2Bdeb11u1
deb
debian
jackson-databind
2.12.1-1+deb11u1
{}
None
Hi, I’d like to start looking into this area.
From what I understand: • #1252 is about choosing the best fixed version — ideally the lowest version that fixes the vulnerability and is itself not vulnerable. • #1253 highlights that URL-encoded Debian versions (%2B) cause issues in version comparison, and we already decode internally via PackageURL.from_string() or urllib.parse.unquote().
Before I start experimenting, a quick clarification:
Should the “best fixed version” logic always run on the decoded version string (e.g., 2.12.1-1+deb11u1), even if the stored PURL uses %2B? If yes, then the path is straightforward: • normalize versions at import time • run version ordering + univers comparisons on the decoded value • when reporting UI/API results, use the decoded version • keep the original PURL intact for lookup
This avoids univers errors and keeps fixed-version evaluation consistent across ecosystems (Debian especially).
If this direction looks right, I’ll put together a small proposal + initial patch.