fix: ensure that npm dependencies retain their "production" grouping
This resolves an inconsistency in the scanner output for npm packages that appear in the same tree multiple times but in different groups; this happens because the table outputter deduplicates on groups on the assumption that a package can only appear in a single group which is incorrect for the NPM ecosystem.
To address this, I've introduced an internal map type that ensures groups are merged when a package is added, with the twist that if either instance of a package being merged is in no groups then that is the result of the merge because implicitly that means an instance of the package is in the "production" group which takes priority over the other groups.
This is something that should most likely be improved on the future, but right now this fix should be good enough to ship since afaik it doesn't impact any other ecosystem and groups are something of a PoC given we can't resolve that information richly enough across all ecosystems yet.
I think this technically could impact pnpm as well, but none of our fixtures gave a good indicator and the latest lockfile version doesn't have that information anymore so frankly I'm not worrying about it at this point.
Resolves #924
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 65.99%. Comparing base (
e35bd05) to head (c132a55).
Additional details and impacted files
@@ Coverage Diff @@
## main #939 +/- ##
==========================================
+ Coverage 65.94% 65.99% +0.05%
==========================================
Files 159 159
Lines 12758 12778 +20
==========================================
+ Hits 8413 8433 +20
Misses 3885 3885
Partials 460 460
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Looking a bit into this, there's another awkward edge case with optional prod dependencies & non-optional dev dependencies (devOptional) that we might want to consider.
dev, optional, devOptional: If the package is strictly part of the
devDependenciestree, thendevwill be true. If it is strictly part of theoptionalDependenciestree, thenoptionalwill be set. If it is both adevdependency and anoptionaldependency of a non-dev dependency, thendevOptionalwill be set. (Anoptionaldependency of adevdependency will have bothdevandoptionalset.)
https://docs.npmjs.com/cli/v9/configuring-npm/package-lock-json#packages
Our groups parser doesn't seem to get this correct at the moment: https://github.com/google/osv-scanner/blob/97029011c8adaf14f03fa66519836063204bf474/pkg/lockfile/parse-npm-lock.go#L125-L137
So I guess we ought to differentiate between "optional in prod, required in dev" (devOptional) and "not in prod, optional in dev" (dev && optional) somehow, and also apply that to the deduplication logic.
This whole thing kind of comes to some thoughts I've been having on counting vulnerabilities and how or whether we should be deduplicating vulnerabilities in an npm lockfile in the first place. I feel like it'd generally be clearer to have one entry per vulnerability ID, with a complete list of instances of affected packages. Maybe that's a thing to add to the V2 Wishlist.
I don't think I disagree with that but frankly it makes my head spin a bit 😅
Personally I feel like it's probably safest to at least start by just treating groups as part of the uniqueness of a package - I'm pretty sure for most ecosystems that won't change anything because you can only have instance of a package regardless of group, and for ecosystems like NPM it'd mean you would just end up with some more packages in the output.
Then we can work on how to make the output smarter without worrying about the current version of the cli giving false negatives in the meantime
note to self: consider exploring if an empty string could be useful here - i.e. if a package is not optional or dev, then its ""
(though thinking about it, I think that could just lead back to this same solution - I think either way, its probably correct more as it needs to be handled within the extractor rather than at the outputter level right now)