syft package.json authors keyword parsing

What would you like to be added: In the current parse_package_json.go, the expected structure to parse includes just the keyword author: https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/javascript/parse_package_json.go#L24 . However, there are plenty of packages that use authors to register multiple authors.

Formats seen:

"authors": {
	   "Harry Potter <[email protected]> (http://youknowwho.com/) ",
	  "John Smith <j.smith[@something.com> (http://awebsite.com/) "
}

Or

"authors": {
	  "Harry Potter",
	  "John Smith"
}

An example package could be: https://github.com/Qix-/color/blob/master/package.json#L11

Why is this needed:

For more accuracy on parsing package.json structures.

Oct 24 '23 09:10 NataliaAn

Hey @NataliaAn, thank you for the report. We'll put this in the backlog for a fix when we are able. If you're interested in making the changes to support multiple authors, let us know and we can point you in the right direction. Thanks again!

Jan 04 '24 21:01 tgerla

@NataliaAn Shouldn't it be contributors? That's at least what official docs on package.json specifies should be used when there are multiple people that need to be included: https://docs.npmjs.com/cli/v6/configuring-npm/package-json#people-fields-author-contributors

However, contributors field is also currently not being parsed, so it would be nice to have, but technically speaking authors is not an officially supported field, it seems.

There is also

top-level "maintainers" field

but this one is not being kept in package.json, I believe.

EDIT: There is many packages that do use authors or even maintainers:

https://sourcegraph.com/search?q=context:global+file:%5Epackage%5C.json%24+%22authors%22&patternType=standard&sm=1
https://sourcegraph.com/search?q=context:global+file:%5Epackage%5C.json%24+%22maintainers%22&patternType=standard&sm=1

Jan 24 '24 22:01 jabkoo

@tgerla I'm willing to contribute to this, but I would like some feedback from someone from Anchore team, if it's worth the effort and what would be the best approach.

My concerns:

Should we only support parsing contributors declared in npm docs? Or should we also support authors and/or maintainers? Some packages may use even other fields for that, but those seem to be most common
If we would like to parse all of the above, should they be parsed separately to the fields with the same name or maybe to the same contributors field in the output metadata? Then the question would be if e.g. both maintainers and contributors fields are present, should one have priority over the other for parsing? Or they could be concatenated?
If author is missing, but e.g. contributors is present, should it be used for author field in cyclonedx format?
Any other cases I haven't thought about?

Jan 25 '24 12:01 jabkoo

@jabkoo I guess as long as there are packages that use them, it would be nice to parse all cases that are valid even if they don't follow the latest official recommendation...?

Feb 01 '24 14:02 NataliaAn

Hey @jabkoo, sorry for the delay responding. I chatted with the team and we think that the best approach would be:

Don't overload or combine the different fields: treat contributors, maintainers, and authors as separate fields.
The JSON data shape will need to be modified to support these new fields.
If author is present, put the contents in the authors JSON field.
If any of the fields are objects, stringify them.

There will probably be two pull requests involved here:

Add contributors, maintainers, and authors to the JSON data shape
A separate PR to handle the transformations into CycloneDX and other formats

Let me know what you think--happy to provide more feedback whenever you need. Thank you for taking this on!

Feb 01 '24 19:02 tgerla

syft syft copied to clipboard

package.json authors keyword parsing

syft
syft copied to clipboard