bug(sbom): Duplicate SBOM packages for multi-module pom.xml files
Description
mvn handles modules separate.
Trivy uses same logic:
https://github.com/aquasecurity/trivy/blob/57e24aa85382f749df7f673e241caaf3fcbb45cb/pkg/dependency/parser/java/pom/parse.go#L142-L143
But SPDX format doesn't allow duplicate SPDXIDs - https://spdx.github.io/spdx-spec/v2.3/package-information/#72-package-spdx-identifier-field
Same for CycloneDX - https://cyclonedx.org/docs/1.6/json/#components
Solutions
- We will add
workspacerelationship for maven modules (see #7802). After these changes Trivy will userootPkg -> workspace -> directDeps -> IndirectDepslogic. This logic is different frommvnlogic. So may want to remove duplicates in parser. - We will remove duplicates when converting Report into BOM
Example
Test project:
➜ cat pom.xml
<groupId>com.example</groupId>
<artifactId>root</artifactId>
<version>1.0.0</version>
<modules>
<module>module1</module>
<module>module2</module>
</modules>
➜ cat module1/pom.xml
<groupId>com.example</groupId>
<artifactId>module1</artifactId>
<version>1.0.0</version>
<dependencies>
<dependency>
<groupId>org.example</groupId>
<artifactId>example-api</artifactId>
<version>1.1.1</version>
</dependency>
</dependencies>
➜ cat module2/pom.xml
<groupId>com.example</groupId>
<artifactId>module2</artifactId>
<version>2.0.0</version>
<dependencies>
<dependency>
<groupId>org.example</groupId>
<artifactId>example-api</artifactId>
<version>1.1.1</version>
</dependency>
</dependencies>
mvn output:
➜ mvn dependency:tree
[INFO] ------------------------< com.example:module1 >-------------------------
[INFO] Building module1 1.0.0 [1/3]
[INFO] from module1/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] The POM for org.example:example-api:jar:1.1.1 is missing, no dependency information available
[INFO]
[INFO] --- dependency:3.7.0:tree (default-cli) @ module1 ---
[INFO] com.example:module1:jar:1.0.0
[INFO] \- org.example:example-api:jar:1.1.1:compile
[INFO]
[INFO] ------------------------< com.example:module2 >-------------------------
[INFO] Building module2 2.0.0 [2/3]
[INFO] from module2/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- dependency:3.7.0:tree (default-cli) @ module2 ---
[INFO] com.example:module2:jar:2.0.0
[INFO] \- org.example:example-api:jar:1.1.1:compile
[INFO]
[INFO] --------------------------< com.example:root >--------------------------
[INFO] Building root 1.0.0 [3/3]
[INFO] from pom.xml
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- dependency:3.7.0:tree (default-cli) @ root ---
[INFO] com.example:root:pom:1.0.0
[INFO] ------------------------------------------------------------------------
trivy outputs:
➜ trivy -q fs ./pom.xml -f json --list-all-pkgs | grep ID
...
"ID": "org.example:example-api:1.1.1",
"UID": "e574f6e703187373"
"ID": "org.example:example-api:1.1.1",
"UID": "e574f6e703187373"
➜ trivy -q fs ./pom.xml -f spdx-json | grep SPDXID -B 1
...
"name": "org.example:example-api",
"SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
--
"name": "org.example:example-api",
"SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
...
Discussed in https://github.com/aquasecurity/trivy/discussions/7795
I am not sure if trivy reports should contain duplicates. That's why i voted for the 1st solution.
@knqyf263 wdyt? You added this logic, maybe i missed something.
Even if the component has the same name and version, the dependency of the component could be different. https://github.com/aquasecurity/trivy/discussions/6694#discussioncomment-9473852
graph LR;
pomRoot(com.example:root v1.0.0)
mod1(com.example:module1 v1.0.0)
mod2(com.example:module2 v2.0.0)
pomC(org.example:example-api v1.1.1)
pomE(POM E v2.0.0)
pomRoot-->mod1
pomRoot-->mod2
mod1-->pomC
pomC-->pomE
pomC'(org.example:example-api v1.1.1)
pomD'(POM D v1.0.0)
pomE'(POM E v2.1.0)
mod2-->pomC'
mod2-->pomD'
pomC'-->pomE'
pomD'-->pomE'
org.example:example-api:v1.1.1 looks identical, but the child dependency can be different for various reasons (e.g. dependencyManagement). Therefore, I'd say they are really really similar, but different components.
hmm... you're right. I missed that. I'll take a look and update our logic for creating SPDXID
I updated logic for SPDXIDs (#7837). It removes duplicates:
➜ trivy -q fs ./pom.xml -f spdx-json | grep '"org.example:example-api"' -A 1
"name": "org.example:example-api",
"SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
--
"name": "org.example:example-api",
"SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
➜ ./trivy -q fs ./pom.xml -f spdx-json | grep '"org.example:example-api"' -A 1
"name": "org.example:example-api",
"SPDXID": "SPDXRef-Package-a5527f408fa64d61",
--
"name": "org.example:example-api",
"SPDXID": "SPDXRef-Package-b1e7f5814081cb0e",
But i found another problem: We can't correctly choose child component:
We have 2 components with same pkgID, then when we parse dependsOn - we take first found component for all components:
{
...
{
"name": "com.example:root",
"SPDXID": "SPDXRef-Package-beb5534e91f2fc01",
"versionInfo": "1.0.0",
...
},
{
"name": "org.example:example-api",
"SPDXID": "SPDXRef-Package-36a9eeebfbc737b0",
"versionInfo": "1.1.1",
...
},
{
"name": "org.example:example-api",
"SPDXID": "SPDXRef-Package-7d0aa2ced54119b2",
"versionInfo": "1.1.1",
...
},
...
],
"relationships": [
...
{
"spdxElementId": "SPDXRef-Package-beb5534e91f2fc01",
"relatedSpdxElement": "SPDXRef-Package-7d0aa2ced54119b2",
"relationshipType": "DEPENDS_ON"
},
{
"spdxElementId": "SPDXRef-Package-beb5534e91f2fc01",
"relatedSpdxElement": "SPDXRef-Package-7d0aa2ced54119b2",
"relationshipType": "DEPENDS_ON"
},
...
]
}
I thought a bit and found some ideas:
- we will use UID for child dependencies (
dependsOnslice). But for this case we need to build UID in each parser... Also we need to add new filed for same pkgIDs from different modules (e.g. in this map add module name). - Use separate Result for each maven module. This logic is similar to
mvnlogic. In this case result will not contain duplicates. But for this case we need to wrap ourpom.xmlparser (to return[]ftypes.Packageand[]ftypes.Dependencyfor each module). - Add info about root module name into Package. We will use this field to build UID, find module/workspace dependencies, etc.
3.1. We can add
modulefield to root ofPackage(next toDev,Arch, etc.). 3.2 We will addworkspacerelationship (see #7802). We can expandrelationshipfield. This field will include relationship + related/root element. e.g.:RelationshipRoot, ""RelationshipWorkspace, "root/pom.xml"// this module from root pom.xmlRelationshipWorkspace, "module1"(orRelationshipWorkspace, "module1<separator>root/pom.xml)"// nested modules (root->module1->module2). This case formodule2.RelationshipDirect, "module1"// dependency got frommodule1RelationshipIndirect, ""// dependency from root pom.xml
@knqyf263 Can you take a look? Perhaps you will able to see another way.
The current package ID (name@version) was implemented based on the assumption that the identical packages don't exist in the same application. If that's not the case, we need to use another ID. Actually, we already faced that when implementing Julia and used UUID.
https://github.com/aquasecurity/trivy/blob/983ac15f22d36a95bca57a18cad21c5efdb27caf/pkg/dependency/parser/julia/manifest/parse.go#L77-L82
So, can we use UUID or something like that only in Maven? We don't have to re-implement all parsers.
We don't have to re-implement all parsers.
We might need to add similar logic for npm and cargo (we talked about adding a workspace field to the relationship), but I'm not sure if there could be duplicates for them.
But in general you are right. We can only use UUID for specific parsers
So, can we use UUID or something like that only in Maven?
hm... i think it is possible. I will take a look.
User found similar case for dpkg - #8273
But this is strange case (there are 2 status dirs (libssl1 and libssl1.1) with same name/version/etc. (see https://github.com/aquasecurity/trivy/discussions/8273#discussioncomment-11925467).
This looks like an error in the image construction, but on the other hand there are no restrictions for such cases, and we should solve this problem in Trivy. @knqyf263 wdyt?
I also encountered this problem with a Maven project using sub-projects (or whatever it is called). Probably as multiple sub-projects had the same dependency, they were referenced multiple times.
We also encountered this problem as the OP. It would be really helpful to have this fixed!
Not sure if I missed something. WiIll this fix also solve CycloneDX-dependsOn? 🙂 We just encountered similar problems when we tried to import a multi module maven project into dependencyTrack.
{"status":400,"title":"The uploaded BOM is invalid","detail":"Schema validation failed","errors":["$.dependencies[19].dependsOn: must have only unique items in the array" ...
hello @dpschier Can you share example for your case? dependencies shouldn't contain duplicates.
I've uploaded a tiny maven project, including the generated sbom. If you search for 'dependsOn', you can see the duplicated entries.
Regards
Hello @dpschier,
Thanks for your example.
I rechecked, and my answer is yes: fixing this issue will solve your problem.
Two identical packages will use different pkg.ID values => different pkgIdentifier, so Trivy will treat them as separate packages.