ort
ort copied to clipboard
Add to ORT the concept of `includes`
ORT already supports excludes in the .ort.yml (see https://oss-review-toolkit.org/ort/docs/configuration/ort-yml#excludes-basics).
I think it would be worthwhile to also support includes.
We have the following use-case: A gigantic several gigabyte monorepo when you only want to scan a part of it corresponding to some deliverable.
Generating excludes for everything that should not be scanned is not scalable: you would have to update the .ort.yml everytime there is a new thing in the repository would want to exclude from the scan.
Using the VCS path argument only allows to scan on directory but the source of the deliverable can be scattered across some disjoint directories.
If the concept of includes is accepted, we need to discuss their interaction with excludes.
Maybe a first trivial implementation could be "if you have includes defined, excludes are ignored".
Please note that the the filtering based on excludes should be done in a central place e.g. similarly to org.ossreviewtoolkit.model.config.PathExclude#matches and org.ossreviewtoolkit.analyzer.PackageManager.Companion#isPathExcluded.
Couple of remarks:
- "if you have includes defined, excludes are ignored" will not work for a lot of use cases - I commonly need to include certain dirs but exclude files and dirs within them.
- We had several ORT performance regressions - I fear the more includes/excludes complexity supported the more likely ORT will run into performance issues and increase of developer maintenance due to code complexity.
My first though would be to create a mechanism that only supports exclusion of paths based labels - in this way you implement can dynamically generate the a list of dirs to be excluded with OPTIONAL_COMPONENT_OF reason and then apply all other excludes. This would address your mono repository use case but also support a use case we had at HERE where multiple product variants existed each with a different set of directories to include. Also it would allow easier way to do label multiple packages in one go (useful for policy rules).
ort analyze --exclude-all-paths-without-label android
---
excludes:
paths:
- pattern: "**/src/{:funTest|test}/**"
reason: "TEST_OF"
comment: >-
Only used for testing/development.
labels:
paths:
- pattern: "android_app/**"
labels: "android"
comment: >-
Only used for building iOS mobile app.
- pattern: "ios_app/**"
labels: "ios"
comment: >-
Only used for building iOS mobile app.
- pattern: "mobile_app/**"
labels: "android, ios"
comment: >-
Shared component library for the Android and iOS mobile app.
@tsteenbe thanks for your feedback. This is elegant because it doesn't require the explicit addition of includes. Also the interaction with existing excludes is natural, as it it clear that the excludes are additive.
One remark though: we would need --exclude-all-paths-without-label for the Scanner too, to be able to exclude paths before sending content to FossID (when https://github.com/oss-review-toolkit/ort/issues/4242 would be implemented).
And additionally, I guess other scanners could not do anything with this parameter until https://github.com/oss-review-toolkit/ort/issues/5018 is implemented.
One remark though: we would need --exclude-all-paths-without-label for the Scanner too, to be able to exclude paths before sending content to FossID
@nnobelis If I remember correctly excludes embedded within ORT result file so if they are part of analyzer-result.yml then they are automatically passed to scanner. No need to implemented --exclude-all-paths-without-label for the Scanner if --exclude-all-paths-without-label android of the Analyzer just adds paths to exclude into excludes field within the analyzer-result.yml.
The topic was discussed in the ORT Community Meeting on the 22.05:
@MarcelBochtler expressed the extra complexity of adding includes to the .ort.yml, and expressed concerns about their interaction with the excludes. This interaction needs to be clearly documented.
@isasmendiagus suggested to experiment with negative regex in path excludes to emulate include. However it turned out that, if you have different disjoint paths to include in your repository, those negative excludes would cancel each other and nothing would be included.
@sschuberth suggested to test with sparse checkout, to only checkout what needs to be included. However that would require modifying the ProvenanceDownloader to support the sparse checkouts.
It was mentioned that specific includes support could be directly added to the FossId and the ScanOSS scanner, but @sschuberth said that dealing with the technical debt and cleaning up the core is more important than piling up features.
@mnonnenmacher Mentionned that adding support of the excludes to the PathScanners (https://github.com/oss-review-toolkit/ort/issues/5018) is not trivial because of the storage of the scan results, i.e. in that case the excludes should then be stored too.
The labels solution mentioned by @tsteenbe can work. The includes, for sure not, because would simply create a chaotic process. This feature, from the point of view of my organization, is kind of critical, due the existence of multiple big monorepo projects wirh few interdependencies among some repositories, and no simple exclude solution can manage.
Now, an open criticism over the technical debt x new features. In the last few months we saw several new features added exclusively oriented to ort-server inside main ort which are definitively not technical debt, and aren t ort core, so if the decision over technical debt is taken in favor of new features, all this "needed by ort-server" features should be denied as well.
I don't share the concerns about having both includes and excludes, I think when both are combined the behavior is quite intuitive: Includes limit the overall set of files, excludes exclude certain parts of the file tree. I have seen similar concepts in other tools.
The labels idea is interesting, but so far the recommendation for monorepos was to use different .ort.yml files for different applications in the monorepo, because also other configuration like package manager settings or resolutions can differ between applications. If we want to change that I would like to have a complete concept that covers all areas of the file.
The labels idea is interesting
BTW, not directly excluding / including paths, but first labeling them, and then deciding what to do with the labels, resembles a bit ClearlyDefined's concept of "facets" that basically add semantics to paths to "understand the shape of the project". Having a similar concept in ORT would probably allow us to reuse ClearlyDefined's data here.
the recommendation for monorepos was to use different .ort.yml files for different applications in the monorepo, because also other configuration like package manager settings or resolutions can differ between applications
IIUC, this recommendation will stay: even with includes or labels, one should use different .ort.yml for different applications in the monorepo.
The difference is that, in some cases, the amount of includes or label to write is way less then the equivalent using excludes.
@mnonnenmacher Having includes and excludes at the same time is from a concept maybe intuitive but it makes user experience worse (e.g. as user finding out why your includes/excludes does not work). Also I wonder if adding the concept of includes increases the code complexity and there maintenance costs. Introducing path labels that generate path excludes allows not only to handle mono repositories better but also address other use case such as product variants e.g. it's 2 for 1 deal imo.
@mnonnenmacher Having includes and excludes at the same time is from a concept maybe intuitive but it makes user experience worse (e.g. as user finding out why your includes/excludes does not work).
The labels proposal is basically the same as the includes proposal with extra flexibility/complexity, because --exclude-all-paths-without-label could also be called --include-only-paths-with-label.
Also I wonder if adding the concept of includes increases the code complexity and there maintenance costs.
It does, but the label implementation would be even more complex so that's not an argument for labels.
Introducing path labels that generate path excludes allows not only to handle mono repositories better but also address other use case such as product variants e.g. it's 2 for 1 deal imo.
If this is the goal I still think that it should be checked if other parts of the .ort.yml need to be adapted as well to make this work for large monorepos, because, for example, in a large monorepo it could well be that different products or variants also need different package manager configurations or resolutions.
I'm not strictly against labels, I'd just like the concept to be more complete to make sure that it really works for users. And for consistency, should excludes not also work with labels then, so that there would be options like --exclude-files-with-label and --include-files-with-label?
For the format, labels should also be a list in YAML:
- pattern: "mobile_app/**"
labels:
- "android"
- "ios"
comment: >-
Shared component library for the Android and iOS mobile app.
instead of:
- pattern: "mobile_app/**"
labels: "android, ios"
comment: >-
Shared component library for the Android and iOS mobile app.
Discussion notes from the weekly ORT community meeting on Juni 5, 2025:
- @tsteenbe: Prefer to keep excludes/includes defined in .ort.yml and not move to CLI option so it easier to debug/review them
- @sschuberth: Propose to use labels for all includes and excludes and define configuration defining whether a label is to be excluded or included
- @fviernau: Only use for projects e.g. .ort.yml?
- @sschuberth: Use also for package configurations so we can reuse ClearlyDefined facets
- @mnonnenmacher: Support both use case mono repositories and product variants but there may be more configs in .ort.yml that depend on variant. Maybe you for example need different resolutions or license choices for different products variants. Would like have these considered in a more complete proposal.
- @mnonnenmacher: We also have to ensure any label passed via CLI option is captured in ORT result
- @fviernau: Could we take a step back and define the various use cases clearly?
- @tsteenbe: What about conflicts between includes and excludes statements
- @mnonnenmacher: Includes should take priority over excludes, if wrongly defined we should throw exception
Maybe a lightweight supporting includes in ort.yml
guess: I believe enhancing all places which handle excludes to also handle includes is a very large task and adds a bit of complexity if design isn't refactored. It goes through all the stages, also error messages and report formats / UI.
idea: When ort reads the ort.yml, the set of includes and excludes could be taken, and mapped to a set of only excludes which have the same effect. Then, all other places can remain as-is.
I discussed the issue with @mnonnenmacher .
The "automatic conversion of includes to excludes" may not be trivial. Additionally, having as outcome the WebApp report, it is difficult to assess which excludes are generated and which ones are present in the ort.yml. Also it would be unclear, given a generated excludes, which include is its "source".
Regarding the solution with the tag and facets, while interesting, it feels out of scope for this change and should be added later if requested.
Now, about the includes, our suggestion is simple: just provide the includes (path only, no scope) identically to the existing excludes:
includes:
paths:
- pattern: "A glob pattern matching files or paths."
reason: "One of PathIncludeReason e.g. SOURCE_OF (TBD if necessary)."
comment: "A comment further explaining why the path is included."
Then, the logic of the function Excludes.isPathExcluded could be adapted to something like that:
(hasIncludes() && !isPathIncluded()) || isPathExcluded()
What's left to be done:
- Adapt the reporter to reflect that the includes are applied. See
EvaluatedModelMapperand theTablesReportModelMapper.kt. - Aggregate the checks of includes/excludes in a single helper function. Remove usages in
OrtResultand inPackageManager. - (Optional) add support for the scope includes.
@mnonnenmacher, @sschuberth:
After looking at the LicenseInfoResolver, I think the includes should be added to PackageConfiguration:
https://oss-review-toolkit.org/ort/docs/configuration/package-configurations
After looking at the
LicenseInfoResolver, I think the includes should be added toPackageConfiguration:https://oss-review-toolkit.org/ort/docs/configuration/package-configurations
@nnobelis Can you explain the reason behind it? Are path excludes from the .ort.yml converted to package configurations for use in the LicenseInfoResolver?
@mnonnenmacher Yes, they are collected by the DefaultLicenseInfoProvider, which is used by the LicenseInfoResolver.
Look at the changes I had to make in https://github.com/oss-review-toolkit/ort/pull/10762.
In particular https://github.com/boschglobal/oss-review-toolkit/blob/bfb62c7228970c3d4261d33322893da65dfc4ec1/model/src/main/kotlin/licenses/DefaultLicenseInfoProvider.kt#L104-L118.