cdxgen icon indicating copy to clipboard operation
cdxgen copied to clipboard

Parsing Javascript files may extract incorrect (and very long) pkg name/version

Open marob opened this issue 2 years ago • 10 comments

In my use case, I have a projekktor-1.3.09.min.js file (N.B.: this project looks dead, its website links to dead site/advert spam) that starts with the following line (yes, the version doesn't match, but that's not the issue here):

/*! Projekktor v1.2.37 -jarisflash | http://www.projekktor.com | Copyright 2010, 2011, Sascha Kluger, Spinning Airwhale Media, http://www.spinningairwhale.com | GNU General Public License - http://www.projekktor.com/license/
 */

This first line is generated via Grunt.

When parsed through cdxgen parseMinJs function, delimiter is identified as - because of the one at the end of the line (...License - http://...), which leads to a mis-identification of the name and version.

In the end, cdxgen generates:

    {
      "group": "",
      "name": "projekktor-v1.2.37--jarisflash-|-http://www.projekktor.com-|-copyright-2010,-2011,-sascha-kluger,-spinning-airwhale-media,-http://www.spinningairwhale.com-|-gnu-general-public-license",
      "version": "http://www.projekktor.com/license/",
      "scope": "optional",
      "purl": "pkg:npm/projekktor-v1.2.37--jarisflash-%257C-http:%2F%2Fwww.projekktor.com-%257C-copyright-2010%252C-2011%252C-sascha-kluger%252C-spinning-airwhale-media%252C-http:%2F%2Fwww.spinningairwhale.com-%257C-gnu-general-public-license@http:%2F%2Fwww.projekktor.com%2Flicense%2F",
      "type": "library",
      "bom-ref": "pkg:npm/projekktor-v1.2.37--jarisflash-%7C-http://www.projekktor.com-%7C-copyright-2010%2C-2011%2C-sascha-kluger%2C-spinning-airwhale-media%2C-http://www.spinningairwhale.com-%7C-gnu-general-public-license@http://www.projekktor.com/license/",
      "evidence": {... },
      "properties": [...]
    },

This incorrect name/version leads to generating an error in Dependency Track when feeding it that SBOM because of the purl exceeding 255 chars (which is the current size limit):

ERROR [BomUploadProcessingTask] Error while processing bom
javax.jdo.JDOFatalUserException: Attempt to store value "pkg:npm/projekktor-v1.2.37--jarisflash-%257C-http%3A%2F%2Fwww.projekktor.com-%257C-copyright-2010%252C-2011%252C-sascha-kluger%252C-spinning-airwhale-media%252C-http%3A%2F%2Fwww.spinningairwhale.com-%257C-gnu-general-public-license@http%3A%2F%2Fwww.projekktor.com%2Flicense%2F" in column ""PURLCOORDINATES"" that has maximum length of 255. Please correct your data!

marob avatar Nov 17 '23 15:11 marob

@marob, good investigation.

https://github.com/CycloneDX/cdxgen/blob/b397baee86e346e90034a644a5bb40ab39580ddb/utils.js#L1365

Could we change this line to below to see if it works? You can send a PR including this and the version fix.

const tmpPV = l.split(delimiter).split("--")[0];

prabhu avatar Nov 17 '23 16:11 prabhu

@prabhu That would'n work for 2 reasons:

  • l.split(delimiter) returns an array that doesn't have a split function
  • the -- String would only appear after executing .replace(/ /g, "-") several lines below

I was wondering on which specification you based your code to try to parse the package name and version? I didn't find any specification on how those comments should be written in JavaScript files (minified or not). Whatever change we'll add in the code, I think we should have several example files to add as unit tests to make sure the code is parsing the data as expected.

marob avatar Nov 17 '23 17:11 marob

@marob There are no specifications like you figured. So this will be identifying things based on some patterns so there is some visibility into the min files that are being stored or bundled. I am open to improving the logic.

prabhu avatar Nov 17 '23 20:11 prabhu

@marob, are you planning to send a pull request for this?

prabhu avatar Nov 20 '23 11:11 prabhu

@prabhu Maybe, but probably not in the near future. If you have time and an idea on how to improve the parsing, feel free to do it. In my mind, we should try to grab several (tens or hundreds of) JS minified files examples to guess the usual format of the comment header and then write a test to check that our (new) code can parse all the identified formats.

marob avatar Nov 20 '23 13:11 marob

@prabhu Maybe, but probably not in the near future. If you have time and an idea on how to improve the parsing, feel free to do it.

In my mind, we should try to grab several (tens or hundreds of) JS minified files examples to guess the usual format of the comment header and then write a test to check that our (new) code can parse all the identified formats.

Same. Let's keep this open and see if anyone else contributes.

prabhu avatar Nov 20 '23 13:11 prabhu

The size has been increased from 255 to 786 chars in Dependency Track 4.11.0 (https://github.com/DependencyTrack/dependency-track/pull/3560) which should decrease BOM upload errors for cases where the purl length was > 255 but <= 786, but the issue is still here for length > 786.

I still don't have any idea on how to better parse name and version from JS comments as there is no convention/spec. Should we truncate? At which size? How to balance the sizes of name and version if both are long?

Anyway, as @prabhu asked here, it looks like there is no length limitation on purl specification, so we'll always have cases where a valid purl is longer than the arbitrary 786 chars limits defined in Dependency Track.

marob avatar Sep 02 '24 15:09 marob

There is a (currently open) issue on Dependency Track about the 786-chars limit: https://github.com/DependencyTrack/dependency-track/issues/3876

marob avatar Sep 02 '24 15:09 marob

I still don't have any idea on how to better parse name and version from JS comments as there is no convention/spec. Should we truncate? At which size? How to balance the sizes of name and version if both are long?

What are the chances that a valid package name and version combined would exceed 786 characters?

Depending on package manager, one could probably pin packages to a digest. Worst case... SHA-512 maybe? That's 128 characters just for the version. That still leaves ~600 characters for the name and some qualifiers maybe.

This could probably be verified in a study of public packages, but my gut feeling tells me that if you're dealing with such lengths, chances are quite high that something is wrong.

nscuro avatar Sep 02 '24 15:09 nscuro

@nscuro I agree that PURL of more that 786 characters shouldn't exist. But in a real world, and because the spec says there's not limit in length, we should expect encountering PURL with more than 786 characters and conform to the spec by allowing unlimited length PURL (or change the PURL spec to set a length limit, ideally under 786 chars).

marob avatar Sep 02 '24 16:09 marob