syft
syft copied to clipboard
package.json authors keyword parsing
What would you like to be added:
In the current parse_package_json.go, the expected structure to parse includes just the keyword author
: https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/javascript/parse_package_json.go#L24 . However, there are plenty of packages that use authors
to register multiple authors.
Formats seen:
"authors": {
"Harry Potter <[email protected]> (http://youknowwho.com/) ",
"John Smith <j.smith[@something.com> (http://awebsite.com/) "
}
Or
"authors": {
"Harry Potter",
"John Smith"
}
An example package could be: https://github.com/Qix-/color/blob/master/package.json#L11
Why is this needed:
For more accuracy on parsing package.json structures.
Hey @NataliaAn, thank you for the report. We'll put this in the backlog for a fix when we are able. If you're interested in making the changes to support multiple authors, let us know and we can point you in the right direction. Thanks again!
@NataliaAn Shouldn't it be contributors
? That's at least what official docs on package.json
specifies should be used when there are multiple people that need to be included: https://docs.npmjs.com/cli/v6/configuring-npm/package-json#people-fields-author-contributors
However, contributors
field is also currently not being parsed, so it would be nice to have, but technically speaking authors
is not an officially supported field, it seems.
There is also
top-level "maintainers" field
but this one is not being kept in package.json
, I believe.
EDIT:
There is many packages that do use authors
or even maintainers
:
- https://sourcegraph.com/search?q=context:global+file:%5Epackage%5C.json%24+%22authors%22&patternType=standard&sm=1
- https://sourcegraph.com/search?q=context:global+file:%5Epackage%5C.json%24+%22maintainers%22&patternType=standard&sm=1
@tgerla I'm willing to contribute to this, but I would like some feedback from someone from Anchore team, if it's worth the effort and what would be the best approach.
My concerns:
- Should we only support parsing
contributors
declared in npm docs? Or should we also supportauthors
and/ormaintainers
? Some packages may use even other fields for that, but those seem to be most common - If we would like to parse all of the above, should they be parsed separately to the fields with the same name or maybe to the same
contributors
field in the output metadata? Then the question would be if e.g. bothmaintainers
andcontributors
fields are present, should one have priority over the other for parsing? Or they could be concatenated? - If
author
is missing, but e.g.contributors
is present, should it be used forauthor
field in cyclonedx format? - Any other cases I haven't thought about?
@jabkoo I guess as long as there are packages that use them, it would be nice to parse all cases that are valid even if they don't follow the latest official recommendation...?
Hey @jabkoo, sorry for the delay responding. I chatted with the team and we think that the best approach would be:
- Don't overload or combine the different fields: treat
contributors
,maintainers
, andauthors
as separate fields. - The JSON data shape will need to be modified to support these new fields.
- If
author
is present, put the contents in theauthors
JSON field. - If any of the fields are objects, stringify them.
There will probably be two pull requests involved here:
- Add
contributors
,maintainers
, andauthors
to the JSON data shape - A separate PR to handle the transformations into CycloneDX and other formats
Let me know what you think--happy to provide more feedback whenever you need. Thank you for taking this on!