ort
ort copied to clipboard
AOSD-Reporter 2.1, subcomponents in the output JSON are not as specified
Describe the bug
The spec for the AOSD-format 2.1 describes the subcomponents as following:
for every license identified within all files of the softwarecomponent shall a subcomponent be provided. - The first subcomponent in a component block should contain the main license of the component and must be named main. - All following subcomponents inside a component can be freely assigned.
Actually, the AOSD-reporter puts all licenses in the main subcomponent, which is wrong. Subcomponent called "main" should be the declared license(s) only. Any additional licenses should be in an additional subcomponent. The results of the license findings are (from v42.0.1 cause the webapp output of version 55.0.0 lacks the "detected excluded"):
Effective SPDX
Apache-2.0 AND EPL-2.0 AND EPL-2.0 AND GPL-2.0-only WITH Classpath-exception-2.0
Declared
EPL-2.0, GPL-2.0-with-classpath-exception
Declared (SPDX)
EPL-2.0 OR GPL-2.0-only WITH Classpath-exception-2.0
Detected
Apache-2.0, EPL-2.0, EPL-2.0 OR GPL-2.0-only WITH Classpath-exception-2.0, GPL-2.0-only WITH Classpath-exception-2.0
Detected Excluded
LicenseRef-scancode-efsl-1.0, LicenseRef-scancode-unknown-license-reference
To Reproduce
Steps to reproduce the behavior:
- Use a package with multiple license findings
- Export the results with the AOSD2.1-Reporter
- Export the results as WebApp for comparison, if needed
- See results
Expected behavior
This is the output, as it should be, according to the description of the spec (unnecessary fields have been removed):
{
"componentName": "jakarta.ws.rs-api",
"componentVersion": "3.0.0",
"id": 22,
"linking": "dynamic_linking",
"modified": false,
"scmUrl": "https://github.com/jakartaee/rest.git",
"subcomponents": [
{
"licenseText": "Eclipse Public License - v 2.0\n--\nGNU GENERAL PUBLIC LICENSE",
"licenseTextUrl": "",
"selectedLicense": "EPL-2.0",
"spdxId": "EPL-2.0 OR GPL-2.0-only WITH Classpath-exception-2.0",
"subcomponentName": "main"
},
{
"licenseText": "Apache License",
"licenseTextUrl": "",
"selectedLicense": "",
"spdxId": "Apache-2.0",
"subcomponentName": "sub1"
}
],
"transitiveDependencies": [
]
}
Console / log output
This is the actual output in the JSON-File.
{
"componentName": "jakarta.ws.rs-api",
"componentVersion": "3.0.0",
"id": 22,
"linking": "dynamic_linking",
"modified": false,
"scmUrl": "https://github.com/jakartaee/rest.git",
"subcomponents": [
{
"licenseText": "Apache License\n--\nEclipse Public License - v 2.0\n--\nGNU GENERAL PUBLIC LICENSE",
"licenseTextUrl": "",
"selectedLicense": "Apache-2.0 AND EPL-2.0 AND GPL-2.0-only WITH Classpath-exception-2.0",
"spdxId": "Apache-2.0 AND EPL-2.0 AND (EPL-2.0 OR GPL-2.0-only WITH Classpath-exception-2.0) AND GPL-2.0-only WITH Classpath-exception-2.0",
"subcomponentName": "main"
}
],
"transitiveDependencies": [
]
}
Environment
- ORT version: [e.g. 55.0.0]
- Java version: [e.g. 21]
- OS: [e.g. MS Windows 10]
@MNesche even after read the description again, I'm still unclear what a "subcomponent" actually is, and what defines it.
Is any arbitrary set of files that happen to have the same license a subcomponent? That would seem weird, esp. as the spdxId of a subcomponent can be an SPDX expression, i.e. it might not only include the OR but also the AND operator, basically allowing you to group everything into one main subcomponent, like ORT currently does.
Subcomponent called "main" should be the declared license(s) only.
I'm also not sure about that. What about a license detected in a root LICENSE file of a software component? Shouldn't that also be a "main" license?
Note that ORT actually doe snot have the concept of a "main license" of a package yet, but I've implemented something like that already for the SPDX report.
Hi @sschuberth, thank you for your reply. The description posted is from the item-field, here's the description of the array "subcomponents" itself:
| "Mandatory - Array with all subcomponents of the specific software component. A subcomponent is a finding in a software component with license and / or copyright information (sometimes also referred to as part). Usually there is a main license of the component and further subcomponent licenses in individual directories or files of the component. - Important hint: The first subcomponent in every component block must be named main!" |
|---|
Indeed, the description is probably a bit inaccurate, but the declared license(s) should be the main license and any other finding as subcomponent, no matter where it has been found. Unfortunately, the specified examples aren't very good also, otherwise I'd post em here. I'm not a lawyer but it would make some sense, cause what is important actually, is the license under which the package is made public (declared) and if this license is compatible with any contained code under different licenses (i.e. code snippets or the origin license, if the package is an adjusted "new" work).
Not sure if you're familiar with Black Duck, but there it's handled similarly. The declared license is just called "License" and "Deep Licenses" is everything else that has been found additionally and is listed individually. Could be, it was an intention to develop that scheme, but I really don't know.
So, a license detected in a root LICENSE file is not a "main" license? Quite odd, IMO.
Also, this still leaves the question open to me how many non-main subcomponents we should have. Just one, where the conjunction of all detected licenses is put?
Well, guess we'd have to discuss your question about the License-file in a root with a lawyer to get a bulletproof reply ;).
About the additional subcomponents, I can only reply how we made it.
We put any license finding in an individual subcomponent. Reason is, that if you put all in one subcomponent, the laywers who review the reports (and anybody else) have to "split up" the whole license-texts again, because they review these also.
So it's a lot more review-friendly, if they get the license text of the GPL in subcomponent_1 and the text of the MIT in subcomponent_2 for example, instead of both texts alltogether in one string.
However, license-findings with a choice have to be in one subcomponent, cause there needs to be a value in the "selectedLicense"-field.
In that case, we used the term ___OR___ as a divider for the license texts.
Since it's probably not only the lawyer reviewing the file, but even the party who created it, if there are import problems, try to search for an "or" ... and good luck finding the delimiter ;). If using ___OR___ instead, you'll find it immediately.
We had a couple of import issues with the format and needed to review the JSONs manually, to find the problems, that's why we came up with this sort of solution.
Well, guess we'd have to discuss your question about the License-file in a root with a lawyer to get a bulletproof reply ;).
Actually, IIRC it was @LeChasseur who proposed at some ORT Community Days to introduce the concept of a "main license for a package" in ORT, which (again IIRC) explicitly included the license detected in a LICENSE file in the root of a repository.
Also, I just filed this PR to make the idea of a "main license" more prominent in ORT.
We put any license finding in an individual subcomponent.
I guess that should say "any distinct license finding", right? Because in ORT, a "license finding" refers to an individual finding within a file, and a single file can have multiple findings for different licenses.
In that case, we used the term
___OR___as a divider for the license texts.
That does not sound very... standard 😉
Good morning & thank you for your reply,
I guess that should say "any distinct license finding", right? Because in ORT, a "license finding" refers to an individual finding within a file, and a single file can have multiple findings for different licenses.
Indeed, any distinct license finding, but any license-spdx only mentioned once as subcomponent (no duplicates).
In the topic of a main license, I don't really understand what would make the difference to the "declared" license. From my understanding, the declared license is the main license of a package, because the developer declared to make the package public under this License.
The declared license is already part of ORT and from a user's perspective working with the Webapp as "viewer" for the ORT results, the standard view already shows that in a good way. ORT is already quite complex on it's own, not sure if an additional license category would make it better :).
In that case, we used the term
___OR___as a divider for the license texts. That does not sound very... standard 😉
True, I'm always open for new ideas :D
From my understanding, the declared license is the main license of a package, because the developer declared to make the package public under this License.
ORT uses the term "declared" in a more specific way: For ORT, a declared license exclusively comes from package metadata. That is, if you as a human "declare" a license as part of a LICENSE file in a repository, that's not a "declared license" in the ORT sense, because it does not come from metadata, but from a file that needs to be scanned by a scanner; hence ORT calls this a "detected license".
Alright, good to know 😄
Actually, IIRC it was @LeChasseur who proposed at some ORT Community Days to introduce the concept of a "main license for a package" in ORT, which (again IIRC) explicitly included the license detected in a
LICENSEfile in the root of a repository.
Would be a good idea. The same applies to the COPYING file, or if such LICENSE or COPYING file does not exist, the README.
To consider: two files, both COPYING and COPYING.Lesser in the root (e.g. https://gitlab.com/gnutls/gnutls/-/tree/master and many other GNU projects).
The same applies to the COPYING file, or if such LICENSE or COPYING file does not exist, the README.
Agreed. That basically matches what we already have here (file names are matched case-insensitively).