feat(scanner): Merge duplicate scan results that share a provenance
When the SpdxDocumentFile package manager is used, the project and all contained packages often resolve to the same VCS provenance (e.g. the root of the Git repository).
Before this change ORT stored two separate ScanResults for such a provenance – one keyed to the project, one keyed to the package.
That caused two follow-on problems:
- Both results appeared in the
OrtResult, so evaluators saw duplicate findings for the same source tree. - Because projects and packages are handled by different rules the package result was additionally padded with a
SpdxConstants.NONEfinding wheneverincludeFilesWithoutFindingswas enabled. The evaluator therefore compared real license findings from the project result withNONEfrom the package result and failed with a violation.
This patch
- groups scan results by the pair
(provenance, scanner)and folds them into a singleScanResult, - unions the inner finding sets to avoid duplicates, and
- performs the "pad with NONE" step only after deduplication, so every path is represented exactly once.
As a consequence the evaluator now receives one consistent set of license findings per provenance / scanner, eliminating the false mismatch.
This is the first time for me writing Kotlin. Sorry if the code is not up to the usual standards.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 57.30%. Comparing base (e2dd087) to head (d536a3e).
:warning: Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #10502 +/- ##
============================================
+ Coverage 57.28% 57.30% +0.01%
- Complexity 1644 1648 +4
============================================
Files 341 341
Lines 12722 12722
Branches 1206 1206
============================================
+ Hits 7288 7290 +2
+ Misses 4971 4969 -2
Partials 463 463
| Flag | Coverage Δ | |
|---|---|---|
| funTest-docker | 71.28% <ø> (ø) |
|
| funTest-non-docker | 33.00% <ø> (-0.01%) |
:arrow_down: |
| test-ubuntu-24.04 | 41.77% <ø> (ø) |
|
| test-windows-2022 | 41.75% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@MarcelBochtler I added a test to test the deduplication. Running the same test on main will cause duplicated information.
$ ./gradlew scanner:funTest --tests "org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest"
$ ./gradlew scanner:funTest --tests "org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest"Parallel Configuration Cache is an incubating feature.
Calculating task graph as configuration cache cannot be reused because a build logic input of type 'SemInfoVersionValueSource' has changed.
Type-safe project accessors is an incubating feature.
> Configure project :
Building ORT version 61.1.0.
> Task :scanner:funTest
org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected ORT result STARTED
org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected ORT result PASSED
org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) scan results STARTED
org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) scan results PASSED
org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) file lists STARTED
org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning all packages corresponding to a single VCS should > return the expected (merged) file lists PASSED
org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning a subset of the packages corresponding to a single VCS should > return the expected ORT result STARTED
org.ossreviewtoolkit.scanner.scanners.ScannerIntegrationFunTest > Scanning a subset of the packages corresponding to a single VCS should > return the expected ORT result FAILED
io.kotest.assertions.AssertionFailedError: expected:<[Deletion at line 298] end_line: -1
scanners:
Dummy::pkg1:1.0.0:
- "Dummy"
Dummy::pkg3:1.0.0:
- "Dummy"
Dummy::project:1.0.0:
- "Dummy"
files:
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo.git"
revision: "a732695e03efcbd74539208af98c297ee86e49d5"
path: ""
resolved_revision: "a732695e03efcbd74539208af98c297ee86e49d5"
files:
- path: "LICENSE"
sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
- path: "README"
sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
- path: "pkg-s1/pkg-s1.txt"
sha1: "e5fb17f8f4f4ef0748bb5ba137fd0e091dd5a1f6"
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo2.git"
revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
path: ""
resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
files:
- path: "LICENSE"
sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
- path: "README"
sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
- path: "pkg-s2/pkg-s2.txt"
sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
path: ""
resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
files:
- path: ".gitmodules"
sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
- path: "LICENSE"
sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
- path: "README"
sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
- path: "pkg1/pkg1.txt"
sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
- path: "pkg2/pkg2.txt"
sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
- path: "pkg3/pkg3.txt"
sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
- path: "pkg4/pkg4.txt"
sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
advisor: null
evaluator: null
resolved_configuration: {}
[Deletion at line 386] > but was:<[Deletion at line 298] end_line: -1
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
path: ""
resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
scanner:
name: "Dummy"
version: "1.0.0"
configuration: ""
summary:
start_time: "1970-01-01T00:00:00Z"
end_time: "1970-01-01T00:00:00Z"
licenses:
- license: "NOASSERTION"
location:
path: "LICENSE"
start_line: -1
end_line: -1
- license: "NOASSERTION"
location:
path: "pkg1/pkg1.txt"
start_line: -1
end_line: -1
- license: "NOASSERTION"
location:
path: "pkg3/pkg3.txt"
start_line: -1
end_line: -1
scanners:
Dummy::pkg1:1.0.0:
- "Dummy"
Dummy::pkg3:1.0.0:
- "Dummy"
Dummy::project:1.0.0:
- "Dummy"
files:
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo.git"
revision: "a732695e03efcbd74539208af98c297ee86e49d5"
path: ""
resolved_revision: "a732695e03efcbd74539208af98c297ee86e49d5"
files:
- path: "LICENSE"
sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
- path: "README"
sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
- path: "pkg-s1/pkg-s1.txt"
sha1: "e5fb17f8f4f4ef0748bb5ba137fd0e091dd5a1f6"
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner-subrepo2.git"
revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
path: ""
resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
files:
- path: "LICENSE"
sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
- path: "README"
sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
- path: "pkg-s2/pkg-s2.txt"
sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
path: ""
resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
files:
- path: ".gitmodules"
sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
- path: "LICENSE"
sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
- path: "README"
sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
- path: "pkg1/pkg1.txt"
sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
- path: "pkg2/pkg2.txt"
sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
- path: "pkg3/pkg3.txt"
sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
- path: "pkg4/pkg4.txt"
sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
path: ""
resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
files:
- path: "pkg1/pkg1.txt"
sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
- path: "pkg3/pkg3.txt"
sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
advisor: null
evaluator: null
resolved_configuration: {}
[Deletion at line 356] path: ""
resolved_revision: "6431fd85188db22b942deb66c7a8c1a53921fc35"
files:
- path: "LICENSE"
sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
- path: "README"
sha1: "ae8044f7fce7ee914a853c30c3085895e9be8b9c"
- path: "pkg-s2/pkg-s2.txt"
sha1: "37996d13eceb6b29db43a381ce8df375b5eee8e9"
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
path: ""
resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
files:
- path: ".gitmodules"
sha1: "d7f070ddbe0b6dd8a173714d565a1240dd96eacd"
- path: "LICENSE"
sha1: "7df059597099bb7dcf25d2a9aedfaf4465f72d8d"
- path: "README"
sha1: "82cfc115138054ce5b5e6839f38687c9d7186710"
- path: "pkg1/pkg1.txt"
sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
- path: "pkg2/pkg2.txt"
sha1: "cc8f97cebe1dc0ed889a31f504bcf491d5241aaa"
- path: "pkg3/pkg3.txt"
sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
- path: "pkg4/pkg4.txt"
sha1: "3cba29011be2b9d59f6204d6fa0a386b1b2dbd90"
- provenance:
vcs_info:
type: "Git"
url: "https://github.com/oss-review-toolkit/ort-test-data-scanner.git"
revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
path: ""
resolved_revision: "97d57bb4795bc41f496e1a8e2c7751cefc7da7ec"
files:
- path: "pkg1/pkg1.txt"
sha1: "22eb73bd30d47540a4e05781f0f6e07640857cae"
- path: "pkg3/pkg3.txt"
sha1: "859d66be2d153968cdaa8ec7265270c241eea024"
advisor: null
evaluator: null
resolved_configuration: {}
>
16:50:31.855 [ForkJoinPool-1-worker-1] DEBUG org.eclipse.jgit.internal.util.ShutdownHook - Cleanup org.eclipse.jgit.util.FS$FileStoreAttributes$$Lambda/0x00007eabf0386c10@e239dec during JVM shutdown
> Task :scanner:funTest FAILED
4 tests completed, 1 failed
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':scanner:funTest'.
> There were failing tests. See the report at: file:///workspaces/ort/scanner/build/reports/tests/funTest/index.html
* Try:
> Run with --scan to get full insights.
BUILD FAILED in 53s
81 actionable tasks: 52 executed, 29 up-to-date
Configuration cache entry stored.
The code can be seen in action here: https://github.com/elixir-lang/elixir/actions/runs/16086802339/job/45398947306 (See the Action Artifacts for Details)