packageurl-js icon indicating copy to clipboard operation
packageurl-js copied to clipboard

Incorrect validation of version encoding

Open matt-phylum opened this issue 1 year ago • 8 comments

After parsing the version number out of a PURL string, packageurl-js validates that the version number was encoded exactly the way that packageurl-js would have encoded it. This seems like an extra thing that packageurl-js does, undirected by the spec. Even if the spec did say to validate that the version number was percent encoded, that doesn't actually mean anything because technically any ASCII string that doesn't contain '%' is a valid percent encoded string by definition.

This is problematic because different PURL implementations encode different sets of characters. If a version number contains a character that one PURL implementation unnecessary escapes and packageurl-js does not, or vice versa, packageurl-js will correctly parse the version number and then throw an exception because the encoding validation failed. For example, packageurl-js will fail to decode this valid PURL produced by packageurl-java: pkg:t/n@1%3A2.

matt-phylum avatar Nov 08 '23 16:11 matt-phylum

Hi, I observed the same behaviour which seems to be regression with the newest released version (1.2.1). the call PackageURL.fromString('pkg:maven/a@2+3') doesn't produce an error. But the call PackageURL.fromString('pkg:maven/a@2%2B3') returns an error: Uncaught Error: Invalid purl: version must be percent-encoded.

When using version 1.2.0 I get the following: PackageURL.fromString('pkg:maven/a@2+3') -> Uncaught Error: Invalid purl: version must be percent-encoded PackageURL.fromString('pkg:maven/a@2%2B3') -> Purl is valid no error

The behaviour of version 1.2.0 is the one I expected

chrsch-dev avatar Nov 08 '23 17:11 chrsch-dev

They should both be valid.

2+3 is produced by althonos/packageurl.rs, package-url/packageurl-js, phylum-dev/purl.

2%2B3 is produced by package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-php, package-url/packageurl-python, package-url/packageurl-ruby, package-url/packageurl-swift.

Be careful with version numbers that contain + because some implementations will incorrectly convert it into a space: package-url/packageurl-dotnet, package-url/packageurl-ruby. (package-url/purl-spec#261) All other implementations correctly parse it as +, except for this implementation which throws an exception.

matt-phylum avatar Nov 08 '23 17:11 matt-phylum

We had a few external contributions around this issue that seem to have created more problems. My bad for not looking deeper at the contributions/test additions and taking them at face value. I will take a look at this later tonight and hopefully have a fix.

steven-esser avatar Nov 08 '23 21:11 steven-esser

https://github.com/package-url/packageurl-js/pull/61 with the fix for this.

steven-esser avatar Nov 09 '23 01:11 steven-esser

They should both be valid.

2+3 is produced by althonos/packageurl.rs, package-url/packageurl-js, phylum-dev/purl.

2%2B3 is produced by package-url/packageurl-dotnet, package-url/packageurl-go, package-url/packageurl-java, package-url/packageurl-php, package-url/packageurl-python, package-url/packageurl-ruby, package-url/packageurl-swift.

Be careful with version numbers that contain + because some implementations will incorrectly convert it into a space: package-url/packageurl-dotnet, package-url/packageurl-ruby. (package-url/purl-spec#261) All other implementations correctly parse it as +, except for this implementation which throws an exception.

Hm, in the purl specification it says for the version A version must be a percent-encoded string. So I would assume pkg:maven/a@2+3 is not correct. It has to be pkg:maven/a@2%2B3 to be valid.

chrsch-dev avatar Nov 09 '23 07:11 chrsch-dev

2+3 is a percent encoded string. Any ASCII string that doesn't contain '%' is percent-encoded, and decodes into the same string. "Percent-encoded" only describes a method of encoding, without specifying which characters need to be encoded (besides '%', obviously). The PURL spec does not specify that '+' should be encoded, and I think encoding it is a minor spec deviation. Especially with the way the spec describes encoding, it seems unreasonable to expect every implementation to create exactly the same canonicalized PURLs down to the percent-encoding, but it's important that implementations are permissive enough to handle both PURLs written by other implementations and PURLs written by humans.

When the spec says the version must be percent-encoded, it means that the version must be decoded on read and encoded on write, using the characters described in the encoding section.

matt-phylum avatar Nov 09 '23 13:11 matt-phylum

Sorry for the late response but I don't get why 2+3 is a valid version string in a purl. If you have a look in the Percent-encoding wikipedia entry at the section Percent-encoding in a URI you can see that after percent encoding the + should be replaced by %2B since + is a reserved character.

Maybe I miss something. Would be great if you could shed some light here.

chrsch-dev avatar Dec 21 '23 14:12 chrsch-dev

Reserved characters are not the characters that are supposed to be encoded. Wikipedia isn't really the best source, but it says "Other characters in a URI must be percent-encoded." and "When a character from the reserved set (a "reserved character") has a special meaning (a "reserved purpose") in a certain context, and a URI scheme says that it is necessary to use that character for some other purpose, then the character must be percent-encoded." This is not the case here. + is not an other character, and in this context it does not have special meaning that we are avoiding by encoding it.

matt-phylum avatar Jan 02 '24 14:01 matt-phylum

Fixed in master branch (both 2%2B3 and 2+3 work)

jdalton avatar Aug 13 '24 16:08 jdalton