ort icon indicating copy to clipboard operation
ort copied to clipboard

feat(scanner): Merge duplicate scan results that share a provenance

Open maennchen opened this issue 10 months ago • 1 comments

When the SpdxDocumentFile package manager is used, the project and all contained packages often resolve to the same VCS provenance (e.g. the root of the Git repository). Before this change ORT stored two separate ScanResults for such a provenance – one keyed to the project, one keyed to the package.

That caused two follow-on problems:

  • Both results appeared in the OrtResult, so evaluators saw duplicate findings for the same source tree.
  • Because projects and packages are handled by different rules the package result was additionally padded with a SpdxConstants.NONE finding whenever includeFilesWithoutFindings was enabled. The evaluator therefore compared real license findings from the project result with NONE from the package result and failed with a violation.

This patch

  • groups scan results by the pair (provenance, scanner) and folds them into a single ScanResult,
  • unions the inner finding sets to avoid duplicates, and
  • performs the "pad with NONE" step only after deduplication, so every path is represented exactly once.

As a consequence the evaluator now receives one consistent set of license findings per provenance / scanner, eliminating the false mismatch.

This is the first time for me writing Kotlin. Sorry if the code is not up to the usual standards.

maennchen avatar Jun 19 '25 18:06 maennchen

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 57.30%. Comparing base (e2dd087) to head (d536a3e). :warning: Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10502      +/-   ##
============================================
+ Coverage     57.28%   57.30%   +0.01%     
- Complexity     1644     1648       +4     
============================================
  Files           341      341              
  Lines         12722    12722              
  Branches       1206     1206              
============================================
+ Hits           7288     7290       +2     
+ Misses         4971     4969       -2     
  Partials        463      463              
Flag Coverage Δ
funTest-docker 71.28% <ø> (ø)
funTest-non-docker 33.00% <ø> (-0.01%) :arrow_down:
test-ubuntu-24.04 41.77% <ø> (ø)
test-windows-2022 41.75% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Jun 19 '25 19:06 codecov[bot]

@MarcelBochtler I added a test to test the deduplication. Running the same test on main will cause duplicated information.

$ ./gradlew scanner:funTest --tests "org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest"

Parallel Configuration Cache is an incubating feature.
Calculating task graph as configuration cache cannot be reused because a build logic input of type 'SemInfoVersionValueSource' has changed.
Type-safe project accessors is an incubating feature.

> Configure project :
Building ORT version 61.1.0.

> Task :scanner:funTest

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected ORT result STARTED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected ORT result PASSED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) scan results STARTED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) scan results PASSED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) file lists STARTED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) file lists PASSED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning a subset of the packages corresponding to a single VCS should > return the expected ORT result STARTED

org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning a subset of the packages corresponding to a single VCS should > return the expected ORT result FAILED
    io.kotest.assertions.AssertionFailedError: expected:<[Deletion at line 298]           end_line: -1
      scanners:
        Dummy::pkg1:1.0.0:
        - "Dummy"
        Dummy::pkg3:1.0.0:
        - "Dummy"
        Dummy::project:1.0.0:
        - "Dummy"
      files:
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo.git"
            revision: "a732695e03efcbd74539208af98c297ee86e49d5"
            path: ""
          resolved_revision: "a732695e03efcbd74539208af98c297ee86e49d5"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s1/pkg-s1.txt"
          sha1: "e5fb17f8f4f4ef0748bb5ba137fd0e091dd5a1f6"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo2.git"
            revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
            path: ""
          resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s2/pkg-s2.txt"
          sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: ".gitmodules"
          sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg2/pkg2.txt"
          sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
        - path: "pkg4/pkg4.txt"
          sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
    advisor: null
    evaluator: null
    resolved_configuration: {}


    [Deletion at line 386] > but was:<[Deletion at line 298]           end_line: -1
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        scanner:
          name: "Dummy"
          version: "1.0.0"
          configuration: ""
        summary:
          start_time: "1970-01-01T00:00:00Z"
          end_time: "1970-01-01T00:00:00Z"
          licenses:
          - license: "NOASSERTION"
            location:
              path: "LICENSE"
              start_line: -1
              end_line: -1
          - license: "NOASSERTION"
            location:
              path: "pkg1/pkg1.txt"
              start_line: -1
              end_line: -1
          - license: "NOASSERTION"
            location:
              path: "pkg3/pkg3.txt"
              start_line: -1
              end_line: -1
      scanners:
        Dummy::pkg1:1.0.0:
        - "Dummy"
        Dummy::pkg3:1.0.0:
        - "Dummy"
        Dummy::project:1.0.0:
        - "Dummy"
      files:
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo.git"
            revision: "a732695e03efcbd74539208af98c297ee86e49d5"
            path: ""
          resolved_revision: "a732695e03efcbd74539208af98c297ee86e49d5"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s1/pkg-s1.txt"
          sha1: "e5fb17f8f4f4ef0748bb5ba137fd0e091dd5a1f6"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo2.git"
            revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
            path: ""
          resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s2/pkg-s2.txt"
          sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: ".gitmodules"
          sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg2/pkg2.txt"
          sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
        - path: "pkg4/pkg4.txt"
          sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
    advisor: null
    evaluator: null
    resolved_configuration: {}


    [Deletion at line 356]         path: ""
          resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
        files:
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
        - path: "pkg-s2/pkg-s2.txt"
          sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: ".gitmodules"
          sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
        - path: "LICENSE"
          sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
        - path: "README"
          sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg2/pkg2.txt"
          sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
        - path: "pkg4/pkg4.txt"
          sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
      - provenance:
          vcs_info:
            type: "Git"
            url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
            revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
            path: ""
          resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
        files:
        - path: "pkg1/pkg1.txt"
          sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
        - path: "pkg3/pkg3.txt"
          sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
    advisor: null
    evaluator: null
    resolved_configuration: {}
    >

16:50:31.855 [ForkJoinPool-1-worker-1] DEBUG org.eclipse.jgit.internal.util.ShutdownHook - Cleanup org.eclipse.jgit.util.FS$FileStoreAttributes$$Lambda/0x00007eabf0386c10@e239dec during JVM shutdown

> Task :scanner:funTest FAILED

4 tests completed, 1 failed

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':scanner:funTest'.
> There were failing tests. See the report at: file:///workspaces/ort/scanner/build/reports/tests/funTest/index.html

* Try:
> Run with --scan to get full insights.

BUILD FAILED in 53s
81 actionable tasks: 52 executed, 29 up-to-date
Configuration cache entry stored.

maennchen avatar Jun 25 '25 17:06 maennchen

The code can be seen in action here: https://github.com/elixir-lang/elixir/actions/runs/16086802339/job/45398947306 (See the Action Artifacts for Details)

maennchen avatar Jul 05 '25 09:07 maennchen