trivy icon indicating copy to clipboard operation
trivy copied to clipboard

refactor: use UUID/hash for Packages IDs from `pom.xml` files.

Open DmitriyLewen opened this issue 1 year ago • 3 comments

Description

There are cases when report contains Packages with same GAV (GroupID, ArtifactID, version). But these are different packages (see https://github.com/aquasecurity/trivy/issues/7824#issuecomment-2446542674).

To avoid confusing and build dependency graph correctly, we need to use UUID for each Package from pom.xml files.

This solution also fixes problem with relationships in SBOM formats for this case (see https://github.com/aquasecurity/trivy/issues/7824#issuecomment-2449312688)

PR blocker - #7889

Related issues

  • Close #7824

Related PR

  • [x] #7889

Checklist

  • [x] I've read the guidelines for contributing to this repository.
  • [x] I've followed the conventions in the PR title.
  • [x] I've added tests that prove my fix is effective or that my feature works.
  • [ ] I've updated the documentation with the relevant information (if needed).
  • [ ] I've added usage information (if the PR introduces new options)
  • [ ] I've included a "before" and "after" example to the description (if the PR is a user interface change).

DmitriyLewen avatar Nov 06 '24 11:11 DmitriyLewen

This PR is stale because it has been labeled with inactivity.

github-actions[bot] avatar Feb 24 '25 00:02 github-actions[bot]

This PR is stale because it has been labeled with inactivity.

github-actions[bot] avatar May 04 '25 00:05 github-actions[bot]

This PR is stale because it has been labeled with inactivity.

github-actions[bot] avatar Jul 05 '25 00:07 github-actions[bot]

This PR is stale because it has been labeled with inactivity.

github-actions[bot] avatar Sep 25 '25 00:09 github-actions[bot]

This PR is stale because it has been labeled with inactivity.

github-actions[bot] avatar Nov 25 '25 00:11 github-actions[bot]

I haven't fully traced through all the code yet, so I may be missing something, but I have a question about the hash calculation approach.

Currently, the hash is calculated using multiple resolved fields:

art.ID = art.Hash(deps, depManagement, props, pom.content.Modules.Module)

My understanding is that duplicate GAVs within a single pom.xml basically don't occur in practice (Maven Enforcer can ban them: https://maven.apache.org/enforcer/enforcer-rules/banDuplicatePomDependencyVersions.html). Is this correct?

If so, would it be simpler to use just hash(GAV, origin pom.xml path)? Then, we don't need to use UUID.

The only case where we need to distinguish the same GAV is across different modules in a multi-module project. In that case, the origin pom.xml path should be sufficient:

module1/pom.xml:
  dependencyManagement:
    example-api: 1.7.30
  dependencies:
    example-dependency: 1.2.5
      └── example-api (resolved to 1.7.30)

module2/pom.xml:
  dependencyManagement:
    example-api: 2.0.0
  dependencies:
    example-dependency: 1.2.5
      └── example-api (resolved to 2.0.0)

These could be distinguished by:

  • hash("example-dependency:1.2.5", "module1/pom.xml")
  • hash("example-dependency:1.2.5", "module2/pom.xml")

Since Maven's dependency resolution is determined by the dependencyManagement of the origin pom.xml, the same origin should always produce the same resolution result.

Is there a specific reason why the current implementation needs all those resolved fields (deps, depManagement, props, modules)?

knqyf263 avatar Nov 25 '25 07:11 knqyf263

A year has passed since I created this PR, so I don’t remember all the nuances anymore.

It’s possible that I was trying to keep the hash reproducible from one project to another.
For example:

  • A
    • B (has an additional entry in DependencyManagement)
      • C
    • C

In this case, C from the A-B-C path and C from the A-C path are different dependencies, because the additional entry in B may change the child dependencies of C.
So if we have two different pom.xml files — one with A-B-C and one with A-C — the hash would be different, while with your approach it would be the same.

However, now that I’ve revisited this problem, I think your approach is preferable:

  1. I’m no longer sure we actually need reproducibility across different files.
    It should be enough that the result is reproducible for the same pom.xml file.
  2. There’s no need to go that deep.
    Even if B changes C’s version (so from A-B-C we get 0.0.2, and from A-C we get 0.0.1), Maven will still resolve versions and keep only one.
    That version is the one it will use, and it’s the one we should reference in Trivy’s DependsOn.
  3. The logic should be simpler.

Regarding modules — this also works correctly.
If a parent POM and a module POM define different versions of the same package, we should choose the module’s version.
This means that selecting the path to the module’s file will correctly resolve the situation.

I will start implementing your idea — if I find any issues, I’ll let you know.

DmitriyLewen avatar Nov 27 '25 10:11 DmitriyLewen

closed in favor of #9880

DmitriyLewen avatar Dec 04 '25 12:12 DmitriyLewen