Unexpected (erroneous?) result from dash-license-tool for Orbit dependencies
In LSP4E, build is failing with
[INFO] --- license-tool-plugin:0.0.1-SNAPSHOT:license-check (default-cli) @ parent ---
[INFO] Querying Eclipse Foundation for license data for 166 items.
[INFO] Found 164 items.
[INFO] License information could not be automatically verified for the following content:
[INFO]
[INFO] p2/orbit/p2.eclipse-plugin/org.bouncycastle.bcpg/1.69.0.v20210713-1924
[INFO] p2/orbit/p2.eclipse-plugin/org.bouncycastle.bcprov/1.69.0.v20210923-1401
[INFO]
[INFO] This content is either not correctly mapped by the system, or requires review.
This seems wrong, those artifacts do come from orbit (so they should be trusted) and do have a CQ that approves them.
False positive are now ignored with https://github.com/eclipse/lsp4e/commit/bd41a56995b424750550642fd13fc5769b8abc38 ; the issue is that true-positive are ignored too.
This has come up a couple of times. The problem is that I don't know how to consistently get the entire history of Orbit data. There's a bit more of an explanation here.
While both of these libraries are in the most recent release of Orbit, my only source of data right now is the "recipes" Git repository, and the repository has moved on to the latest version (I actually do actually grab some of the oldest Orbit data, since in the old days metadata files that I can be scanned on the file system were created). I figure that I have two options... pull data from every release tag in the repository or screen scrape the website; I need to update the back end script to do the former.
The problem manifests only periodically since the tool only uses Orbit data to plug holes in our data set (that is, we get most of the data from other sources). Still, it's a PITA when something that had previously passed because we based a mapping on Orbit data that has subsequently been updated.
While this isn't a great answer, you can exclude items from being reviewed by excluding items using the excludeArtifactIds property. Again, it's not a great answer, but maybe you can use it to temporarily bypass things that you know are good. I do have an [https://github.com/eclipse/dash-licenses/issues/110#issuecomment-938739106 issue open] to document this option.
I'll try and sort out the Orbit challenge.
I should have added that I've added a mapping for those two dependencies to the back end, so they should pass. I still need to make the Orbit information sticky.
I'll try and sort out the Orbit challenge.
Why not use the data from the Update-sites? I think that's what actually is released. Let me know if you need any help regarding this.
e.g. I can see CQ/IP log information in the IU data:
<unit id='ch.qos.logback.slf4j' version='1.0.7.v201505121915' singleton='false'>
<update id='ch.qos.logback.slf4j' range='[0.0.0,1.0.7.v201505121915)' severity='0'/>
<properties size='8'>
<property name='df_LT.Bundle-Vendor.0' value='Eclipse Orbit'/>
<property name='df_LT.Bundle-Name.0' value='Logback Native SLF4J Logger'/>
<property name='org.eclipse.equinox.p2.name' value='%Bundle-Name.0'/>
<property name='org.eclipse.equinox.p2.provider' value='%Bundle-Vendor.0'/>
<property name='iplog.bug_id' value='6868'/>
<property name='iplog.contact.name' value='Gunnar Wagenknecht'/>
<property name='iplog.contact.email' value='[email protected]'/>
<property name='org.eclipse.equinox.p2.bundle.localization' value='fragment'/>
</properties>
<provides size='6'>
<provided namespace='org.eclipse.equinox.p2.iu' name='ch.qos.logback.slf4j' version='1.0.7.v201505121915'/>
<provided namespace='osgi.bundle' name='ch.qos.logback.slf4j' version='1.0.7.v201505121915'/>
<provided namespace='java.package' name='org.slf4j.impl' version='1.7.2'/>
<provided namespace='org.eclipse.equinox.p2.eclipse.type' name='bundle' version='1.0.0'/>
<provided namespace='osgi.fragment' name='org.slf4j.api' version='1.0.7.v201505121915'/>
<provided namespace='org.eclipse.equinox.p2.localization' name='df_LT' version='1.0.0'/>
</provides>
<requires size='3'>
<required namespace='osgi.bundle' name='org.slf4j.api' range='[1.7.2,1.7.3)'/>
<required namespace='osgi.bundle' name='ch.qos.logback.core' range='[1.0.7,1.0.8)'/>
<required namespace='osgi.bundle' name='ch.qos.logback.classic' range='[1.0.7,1.0.8)'/>
</requires>
<artifacts size='1'>
<artifact classifier='osgi.bundle' id='ch.qos.logback.slf4j' version='1.0.7.v201505121915'/>
</artifacts>
<touchpoint id='org.eclipse.equinox.p2.osgi' version='1.0.0'/>
<touchpointData size='1'>
<instructions size='1'>
<instruction key='manifest'>
Bundle-SymbolicName: ch.qos.logback.slf4j
Bundle-Version: 1.0.7.v201505121915
Fragment-Host: org.slf4j.api;bundle-version="[1.7.2,1.7.3)"
</instruction>
</instructions>
</touchpointData>
</unit>
That looks very promising...
Let me know if you need any help regarding this.
@laeubi it would be great if you can help. I need the name of the file (as it manifests in the artifactid of the GAV), the version, and the CQ id (when it is available).
@waynebeaton how familiar are you with P2 would you just need some generic information how to match the different information or a code snippet or ...
@laeubi, my knowledge of p2 is very shallow. But, it seems like it would be easy enough to just parse the XML. How do we get access to the XML?
You can use for sure the XML directly but be prepared that there are some specialties :-)
Lets start from https://download.eclipse.org/tools/orbit/downloads/2022-03/ (of course you might want to include older releases as well), you see some "magic files" compositeArtifacts.xml and compositeContent.xml that point to the actual content. A more detailed description is here: https://wiki.eclipse.org/Equinox/p2/Composite_Repositories_(new)
I have written a while a go some code to handle this independent from the P2 codebase you can find it here: https://github.com/ops4j/org.ops4j.pax.exam2/tree/master/containers/pax-exam-container-eclipse/src/main/java/org/ops4j/pax/exam/container/eclipse/impl/sources/p2repository
The most interesting part would maybe here: https://github.com/ops4j/org.ops4j.pax.exam2/blob/master/containers/pax-exam-container-eclipse/src/main/java/org/ops4j/pax/exam/container/eclipse/impl/sources/p2repository/P2Index.java
If you think any of the code is useful I hereby grant you the permission to also use it under the Terms of the EPL 2.0.
@waynebeaton @mickaelistria I have read a bit about clearly defined curation repository and think I could came up with a small tool that converts an updatesite (e.g. orbit) into a clearly defined curation entry then it would be possible to contribute the data to the clearly defined "database" on github, Would this be useful?
@laeubi I wouldn't recommend investing in pushing p2 to a wider world and investing too much in making it part of the ecosystem in ClearlyDefined and whatnot. I think the continuous work to make Tycho capable of consuming/publishing Maven artifacts, using p2 as an internal technology, and progressively getting rid of hosted p2 repositories in favor of repositories containing just metadata referencing artifacts that were deployed to Maven and thus have Maven coordinate would be more profitable.
The idea is just that getting rid of P2 would take really really long and it seems its crucial to access the data in a unified way. And as p2 already contains rich meta-data regarding licenses and other stuff it just sounds obvious to publish the data for selected sites (e.g. orbit + simrel repositories).
@waynebeaton @mickaelistria I have read a bit about clearly defined curation repository and think I could came up with a small tool that converts an updatesite (e.g. orbit) into a clearly defined curation entry then it would be possible to contribute the data to the clearly defined "database" on github, Would this be useful?
It's not that easy. For starters, we'd have to get the ClearlyDefined team to recognise p2 as a type and come up with some agreement on what values are valid as sources. If you do send a p2 Id to ClearlyDefined, it rejects the entire payload with an error. I've thought a lot about pursuing that, but decided that it wasn't worth it. The Dash License Tool doesn't even ask ClearlyDefined about p2 resources. It just skips over them.
When you say "curation", do you mean that it includes vetted license information? Are you scanning the corresponding source code for license information?
The only thing that we currently get from Orbit is the mapping between the p2 resource and corresponding CQ so that we can match the ID to the license information that's been curated by the IP Team. Primarily this is historical information since generally anything added to Orbit these days is just vetted by the Dash License Tool (the EBR builds give up the Maven GAVs which we check).
My primary challenge with p2 resources is mapping them back to vetted license information or consistently map them back to their source so that we can use it to vet the license information. At least theoretically, we should be able to match source bundles via the Maven reactor but I haven't explored that yet.
It's not that easy. For starters, we'd have to get the ClearlyDefined team to recognise
p2as a type and come up with some agreement on what values are valid assources.
Who if not EF would be in charge of making some normative definition here?
When you say "curation", do you mean that it includes vetted license information?
If you install content from P2 in eclipse, it asks you to accept certain license, there is even the concept of a "license-feature" to reuse them (e.g. all platform bundles using the same license feature), but I can only speak from a technical/ppgrammers point of view.
My primary challenge with
p2resources is mapping them back to vetted license information or consistently map them back to their source
Actually the id+version+md5 of each P2 Iu should be a very good measure even if they travel multiple sites.
Who if not EF would be in charge of making some normative definition here?
I've already answered this. At least in part, I made a similar assessment as @mickaelistria.
Like I said: "I've thought a lot about pursuing that, but decided that it wasn't worth it."
If you install content from P2 in eclipse, it asks you to accept certain license, there is even the concept of a "license-feature" to reuse them (e.g. all platform bundles using the same license feature), but I can only speak from a technical/ppgrammers point of view.
I'm quite confident that the licenses described in the about.html files included in bundles produced by Eclipse projects are accurate. These would still need to be translated into SPDX codes to be useful for ClearlyDefined or even the Eclipse Foundation's IP API.
Our experience with third party content is, however, that the licenses expressed are not always reflective of the licenses contained. This is why we use tools to scan third party content for license information, and -- at least in many cases -- have to review and curate the results of those scans. When I use the word "curate", I mean that a determination has been made by a human (i.e., the IP Team).
FWIW, we do also periodically scan Eclipse project code to ensure that we remain "quite confident" in the licenses described.
Actually the id+version+md5 of each P2 Iu should be a very good measure even if they travel multiple sites.
At least theoretically building a bundle using different JDKs might produce slightly different JARs that have different md5 signatures (as would subtle changes in a manifest, or inclusion of addition metadata, or signing by different entities, or ...). Under these circumstances, are the produced JARs the same or different? There's some interesting work being done that has potential to actually solve this problem. If you're curious, have a look at GitBOM.
I'm quite confident that the licenses described in the about.html files included in bundles produced by Eclipse projects are accurate.
Thats why we probably could publish content we are confident about (e.g. orbit should be all vetted isn't it?)
At least theoretically building a bundle using different JDKs might produce slightly different JARs that have different md5
In most cases one would not "build" the bundle but just consume this from an existing update-site. So for example with orbit, one won't rebuild it but probably include a 1:1 copy in an own update-site. That way an artifact can 'travel' around but will always be the same, actually P2 assumes that everything that has the same ID+version is the same.
These would still need to be translated into SPDX codes
At least for features, there is also a license URL that could be matched. But if it is not worth thinking about such ways to extract meta-data I'm all fine with it :-)
At least for features, there is also a license URL that could be matched. But if it is not worth thinking about such ways to extract meta-data I'm all fine with it :-)
This is actually a relatively hard problem. Especially when you get into multiple license scenarios.