trivy icon indicating copy to clipboard operation
trivy copied to clipboard

bug(sbom): Duplicate SBOM packages for multi-module pom.xml files

Open DmitriyLewen opened this issue 1 year ago • 13 comments

Description

mvn handles modules separate. Trivy uses same logic: https://github.com/aquasecurity/trivy/blob/57e24aa85382f749df7f673e241caaf3fcbb45cb/pkg/dependency/parser/java/pom/parse.go#L142-L143

But SPDX format doesn't allow duplicate SPDXIDs - https://spdx.github.io/spdx-spec/v2.3/package-information/#72-package-spdx-identifier-field

Same for CycloneDX - https://cyclonedx.org/docs/1.6/json/#components

Solutions

  1. We will add workspace relationship for maven modules (see #7802). After these changes Trivy will use rootPkg -> workspace -> directDeps -> IndirectDeps logic. This logic is different from mvn logic. So may want to remove duplicates in parser.
  2. We will remove duplicates when converting Report into BOM

Example

Test project:

➜  cat pom.xml 
    <groupId>com.example</groupId>
    <artifactId>root</artifactId>
    <version>1.0.0</version>

    <modules>
        <module>module1</module>
        <module>module2</module>
    </modules>

➜  cat module1/pom.xml 
    <groupId>com.example</groupId>
    <artifactId>module1</artifactId>
    <version>1.0.0</version>

    <dependencies>
        <dependency>
            <groupId>org.example</groupId>
            <artifactId>example-api</artifactId>
            <version>1.1.1</version>
        </dependency>
    </dependencies>

➜  cat module2/pom.xml
    <groupId>com.example</groupId>
    <artifactId>module2</artifactId>
    <version>2.0.0</version>

    <dependencies>
        <dependency>
            <groupId>org.example</groupId>
            <artifactId>example-api</artifactId>
            <version>1.1.1</version>
        </dependency>
    </dependencies>

mvn output:

➜  mvn dependency:tree
[INFO] ------------------------< com.example:module1 >-------------------------
[INFO] Building module1 1.0.0                                             [1/3]
[INFO]   from module1/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[WARNING] The POM for org.example:example-api:jar:1.1.1 is missing, no dependency information available
[INFO] 
[INFO] --- dependency:3.7.0:tree (default-cli) @ module1 ---
[INFO] com.example:module1:jar:1.0.0
[INFO] \- org.example:example-api:jar:1.1.1:compile
[INFO] 
[INFO] ------------------------< com.example:module2 >-------------------------
[INFO] Building module2 2.0.0                                             [2/3]
[INFO]   from module2/pom.xml
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- dependency:3.7.0:tree (default-cli) @ module2 ---
[INFO] com.example:module2:jar:2.0.0
[INFO] \- org.example:example-api:jar:1.1.1:compile
[INFO] 
[INFO] --------------------------< com.example:root >--------------------------
[INFO] Building root 1.0.0                                                [3/3]
[INFO]   from pom.xml
[INFO] --------------------------------[ pom ]---------------------------------
[INFO] 
[INFO] --- dependency:3.7.0:tree (default-cli) @ root ---
[INFO] com.example:root:pom:1.0.0
[INFO] ------------------------------------------------------------------------

trivy outputs:

➜  trivy -q fs ./pom.xml -f json --list-all-pkgs | grep ID
...
          "ID": "org.example:example-api:1.1.1",
            "UID": "e574f6e703187373"
          "ID": "org.example:example-api:1.1.1",
            "UID": "e574f6e703187373"

➜  trivy -q fs ./pom.xml -f spdx-json | grep SPDXID -B 1
...
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
--
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
...

Discussed in https://github.com/aquasecurity/trivy/discussions/7795

DmitriyLewen avatar Oct 30 '24 10:10 DmitriyLewen

I am not sure if trivy reports should contain duplicates. That's why i voted for the 1st solution.

@knqyf263 wdyt? You added this logic, maybe i missed something.

DmitriyLewen avatar Oct 30 '24 10:10 DmitriyLewen

Even if the component has the same name and version, the dependency of the component could be different. https://github.com/aquasecurity/trivy/discussions/6694#discussioncomment-9473852

graph LR;
  pomRoot(com.example:root v1.0.0)
  mod1(com.example:module1 v1.0.0)
  mod2(com.example:module2 v2.0.0)
  pomC(org.example:example-api v1.1.1)
  pomE(POM E v2.0.0)

  pomRoot-->mod1
  pomRoot-->mod2
  mod1-->pomC
  pomC-->pomE

  pomC'(org.example:example-api v1.1.1)
  pomD'(POM D v1.0.0)
  pomE'(POM E v2.1.0)

  mod2-->pomC'
  mod2-->pomD'
  pomC'-->pomE'
  pomD'-->pomE'

org.example:example-api:v1.1.1 looks identical, but the child dependency can be different for various reasons (e.g. dependencyManagement). Therefore, I'd say they are really really similar, but different components.

knqyf263 avatar Oct 30 '24 10:10 knqyf263

hmm... you're right. I missed that. I'll take a look and update our logic for creating SPDXID

DmitriyLewen avatar Oct 30 '24 11:10 DmitriyLewen

I updated logic for SPDXIDs (#7837). It removes duplicates:

➜  trivy -q fs ./pom.xml -f spdx-json | grep '"org.example:example-api"' -A 1 
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
--
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a9813b377fc4bc80",
➜  ./trivy -q fs ./pom.xml -f spdx-json | grep '"org.example:example-api"' -A 1
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-a5527f408fa64d61",
--
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-b1e7f5814081cb0e",

But i found another problem: We can't correctly choose child component:

We have 2 components with same pkgID, then when we parse dependsOn - we take first found component for all components:

{
    ...
    {
      "name": "com.example:root",
      "SPDXID": "SPDXRef-Package-beb5534e91f2fc01",
      "versionInfo": "1.0.0",
     ...
    },
    {
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-36a9eeebfbc737b0",
      "versionInfo": "1.1.1",
      ...
    },
    {
      "name": "org.example:example-api",
      "SPDXID": "SPDXRef-Package-7d0aa2ced54119b2",
      "versionInfo": "1.1.1",
      ...
    },
    ...
  ],
  "relationships": [
    ...
    {
      "spdxElementId": "SPDXRef-Package-beb5534e91f2fc01",
      "relatedSpdxElement": "SPDXRef-Package-7d0aa2ced54119b2",
      "relationshipType": "DEPENDS_ON"
    },
    {
      "spdxElementId": "SPDXRef-Package-beb5534e91f2fc01",
      "relatedSpdxElement": "SPDXRef-Package-7d0aa2ced54119b2",
      "relationshipType": "DEPENDS_ON"
    },
    ...
  ]
}         

I thought a bit and found some ideas:

  1. we will use UID for child dependencies (dependsOn slice). But for this case we need to build UID in each parser... Also we need to add new filed for same pkgIDs from different modules (e.g. in this map add module name).
  2. Use separate Result for each maven module. This logic is similar to mvn logic. In this case result will not contain duplicates. But for this case we need to wrap our pom.xml parser (to return []ftypes.Package and []ftypes.Dependency for each module).
  3. Add info about root module name into Package. We will use this field to build UID, find module/workspace dependencies, etc. 3.1. We can addmodule field to root of Package (next to Dev, Arch, etc.). 3.2 We will add workspace relationship (see #7802). We can expand relationship field. This field will include relationship + related/root element. e.g.:
    • RelationshipRoot, ""
    • RelationshipWorkspace, "root/pom.xml" // this module from root pom.xml
    • RelationshipWorkspace, "module1" (or RelationshipWorkspace, "module1<separator>root/pom.xml)" // nested modules (root->module1->module2). This case for module2.
    • RelationshipDirect, "module1" // dependency got from module1
    • RelationshipIndirect, "" // dependency from root pom.xml

@knqyf263 Can you take a look? Perhaps you will able to see another way.

DmitriyLewen avatar Oct 31 '24 08:10 DmitriyLewen

The current package ID (name@version) was implemented based on the assumption that the identical packages don't exist in the same application. If that's not the case, we need to use another ID. Actually, we already faced that when implementing Julia and used UUID. https://github.com/aquasecurity/trivy/blob/983ac15f22d36a95bca57a18cad21c5efdb27caf/pkg/dependency/parser/julia/manifest/parse.go#L77-L82

So, can we use UUID or something like that only in Maven? We don't have to re-implement all parsers.

knqyf263 avatar Nov 05 '24 05:11 knqyf263

We don't have to re-implement all parsers.

We might need to add similar logic for npm and cargo (we talked about adding a workspace field to the relationship), but I'm not sure if there could be duplicates for them.

But in general you are right. We can only use UUID for specific parsers

So, can we use UUID or something like that only in Maven?

hm... i think it is possible. I will take a look.

DmitriyLewen avatar Nov 05 '24 05:11 DmitriyLewen

User found similar case for dpkg - #8273

But this is strange case (there are 2 status dirs (libssl1 and libssl1.1) with same name/version/etc. (see https://github.com/aquasecurity/trivy/discussions/8273#discussioncomment-11925467).

This looks like an error in the image construction, but on the other hand there are no restrictions for such cases, and we should solve this problem in Trivy. @knqyf263 wdyt?

DmitriyLewen avatar Jan 23 '25 06:01 DmitriyLewen

I also encountered this problem with a Maven project using sub-projects (or whatever it is called). Probably as multiple sub-projects had the same dependency, they were referenced multiple times.

StephenKing avatar Feb 05 '25 10:02 StephenKing

We also encountered this problem as the OP. It would be really helpful to have this fixed!

RyuunosukeDS3 avatar Jun 06 '25 12:06 RyuunosukeDS3

Not sure if I missed something. WiIll this fix also solve CycloneDX-dependsOn? 🙂 We just encountered similar problems when we tried to import a multi module maven project into dependencyTrack.

{"status":400,"title":"The uploaded BOM is invalid","detail":"Schema validation failed","errors":["$.dependencies[19].dependsOn: must have only unique items in the array" ...

dpschier avatar Sep 23 '25 14:09 dpschier

hello @dpschier Can you share example for your case? dependencies shouldn't contain duplicates.

DmitriyLewen avatar Sep 24 '25 05:09 DmitriyLewen

I've uploaded a tiny maven project, including the generated sbom. If you search for 'dependsOn', you can see the duplicated entries.

trivy-multimodule.tar.gz

Regards

dpschier avatar Sep 24 '25 09:09 dpschier

Hello @dpschier,
Thanks for your example.

I rechecked, and my answer is yes: fixing this issue will solve your problem.
Two identical packages will use different pkg.ID values => different pkgIdentifier, so Trivy will treat them as separate packages.

DmitriyLewen avatar Sep 25 '25 06:09 DmitriyLewen