dejacode icon indicating copy to clipboard operation
dejacode copied to clipboard

RFC: Store additional license details on the Package model

Open DennisClark opened this issue 1 year ago • 6 comments

Problem: provide more clarity for "Declared License" vs "Concluded License" .

Benefit: support the completeness of an SBOM.

Create an additional declared_license field on Package. When a package scan is completed update both the current assigned_license field and this new declared_license field with the same values. The intention is to retain the declared_license as an historical record, so that the assigned_license field essentially becomes the "concluded license" (we can change the help text on that field).

Store the additional licenses from the scan results on the package model as well. This will support deeper analysis and reporting, enabling users to comment on why specific additional licenses impact or do not impact the licensing terms as the package is expected to be used in an organization.

More design details to follow.

DennisClark avatar Mar 14 '24 14:03 DennisClark

@DennisClark it could also make sense to store the "other licenses" beyond the main, primary concluded license? ... actually I think you already mention this!

pombredanne avatar Mar 14 '24 17:03 pombredanne

@pombredanne right, I meant "other licenses" when I wrote "additional licenses"! We need these to be stored to support really detail-oriented analysis and evaluations for organizations that require that.

DennisClark avatar Mar 14 '24 17:03 DennisClark

ultimately we want to standardize on the following license terminology to be in sync with the open source community:

  • declared license: a license expression derived from statements in the key files of a software project, such as the NOTICE, COPYING, README, and LICENSE files.
  • detected licenses: license expressions derived from clues in the various files of a software project, which are very often third-party software used by the project, or test, sample and documentation files.
  • concluded license: a license expression curated from the declared license, where the curator has performed analysis to clarify or correct the declared license, possibly including one or more detected licenses in the license expression. In DejaCode, this is the license expression assigned to a Package.
  • effective license: a license expression curated in the context of the usage of a Package in a specific Product context, which may assert a license choice when that is an option. In DejaCode this is a Product Item license expression.

DennisClark avatar Mar 15 '24 16:03 DennisClark

We need one more new field to complete this enhancement request. We already have a notes field on Package, but it is a general purpose field. We should create the following:

curation_notes: Text to explain and support the editing of license-expressions and copyright statements on a Package, as well as the usage policy.

DennisClark avatar Mar 19 '24 16:03 DennisClark

@DennisClark It seems that CDX 1.6 will support reporting declared license in addition to concluded license. This is strangely called "acknowledgement" under compoents/licenses/SPDX License Expression: https://cyclonedx.org/docs/1.6/json/#components_items_licenses_oneOf_i1_items_i0_acknowledgement

mjherzog avatar Apr 19 '24 17:04 mjherzog

Note that this issue focuses on Packages (as our first priority) but the model and process changes should apply in the very same manner to Components. Note however that Component license expressions are not normally applied automatically when creating a Component manually, but only when created from a Package.

DennisClark avatar Apr 19 '24 17:04 DennisClark

It's time to raise the priority on this issue, which is essential to complete the curation process in DejaCode and to document the curation process on a package, component, and Product Inventory item. The new field is actually the "declared license", since the current license expression on those objects are effectively the "concluded license" since they are editable. Basically we should default both to the same license expression when set automatically, so that the editing of a "concluded license" becomes a very important, but optional, step.

DennisClark avatar May 28 '24 22:05 DennisClark

The current state of the models regarding license-related fields:

DejaCode Package/Component models:

  • license_expression

ScanCode.io DiscoveredPackage and PurlDB Package model:

  • declared_license_expression
  • declared_license_expression_spdx
  • license_detections
  • other_license_expression
  • other_license_expression_spdx
  • other_license_detections
  • extracted_license_statement

Notes:

  • ScanCode.io and PurlDB share the same license-related fields. While adding new fields to DejaCode, let's keep naming consistency to ease the import of data from SCIO and PurlDB.
  • The declared_license_expression value is the one put in the DejaCode.license_expression during import. That field is currently a mix of data that can be "declared" or "concluded"

Data example from PurlDB:

"declared_license_expression": "elastic-license-v2 AND mongodb-sspl-1.0",
"declared_license_expression_spdx": "Elastic-2.0 AND SSPL-1.0",
"license_detections": [
    {
        "matches": [
            {
                "score": 100.0,
                "matcher": "2-aho",
                "end_line": 1,
                "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/elastic-license-v2_3.RULE",
                "start_line": 1,
                "matched_text": "- name: Elastic License 2.0",
                "match_coverage": 100.0,
                "matched_length": 4,
                "rule_relevance": 100,
                "rule_identifier": "elastic-license-v2_3.RULE",
                "license_expression": "elastic-license-v2"
            },
            ...
        ],
        "identifier": "elastic_license_v2_and_mongodb_sspl_1_0-1ef52e23-8928-8379-5e32-b1c571383a6a",
        "license_expression": "elastic-license-v2 AND mongodb-sspl-1.0"
    }
],
"other_license_expression": "(elastic-license-v2 OR mongodb-sspl-1.0) AND apache-2.0 AND (mongodb-sspl-1.0 AND elastic-license-v2)",
"other_license_expression_spdx": "(Elastic-2.0 OR SSPL-1.0) AND Apache-2.0 AND (SSPL-1.0 AND Elastic-2.0)",
"other_license_detections": [],
"extracted_license_statement": "- name: Elastic License 2.0\n  url: https://raw.githubusercontent.com/elastic/elasticsearch/v7.17.9/licenses/ELASTIC-LICENSE-2.0.txt\n- name: Server Side Public License, v 1\n  url: https://www.mongodb.com/licensing/server-side-public-license\n",

We need to clarify the implementation:

  • Which license fields do we want to add on the DejaCode side and on which models
  • The evolution of the current generic license_expression field on the following models: Product, Package, Component, Subcomponent, ProductPackage, ProductComponent, ProductInventoryItem
  • Define which of the new license fields is displayed in the various UI locations

tdruez avatar May 29 '24 04:05 tdruez

Design document (note still in progress) available for review, comments, suggestions, questions!

https://docs.google.com/document/d/1Y4bznZNm6gwk-2rS8Oqti7kZd-bc-X7R/edit?usp=sharing&ouid=117241222429542576816&rtpof=true&sd=true

  • Proposed changes to the Package and Component models and UI are ready for review.
  • Proposed changes to the Product Relation models and UI are not yet defined, but will be available soon.
  • The impact on the Subcomponent model and UI is still only in the initial concept stage.

DennisClark avatar Jun 06 '24 15:06 DennisClark

I reviewed the design document and the only changes I made were for diction and reducing the font size for the field references in Roboto. Overall we are covering the "last mile"for data definitions that are already present in DejaCode and other AboutCode modules.

mjherzog avatar Jun 06 '24 16:06 mjherzog

The design document at https://docs.google.com/document/d/1Y4bznZNm6gwk-2rS8Oqti7kZd-bc-X7R/edit is ready for comments, suggestions, and questions.

  • Proposed changes to the Package and Component models and UI are ready for review.
  • Proposed changes to the Product Relation models and UI are ready for review..
  • Potential changes to the Subcomponent model are still in the early stage of concept development.

General comment: Unless additional enhancement functional requirements are discovered for the Product Relationship UI, the updates to the Product are relatively light, since most of the impact is in the Package and Component objects.

DennisClark avatar Jun 07 '24 19:06 DennisClark

@DennisClark I've reviewed and commented the design document.

Implementation of the new fields started at https://github.com/nexB/dejacode/pull/130, you can see the details of what is already implemented there.

Elements that require to be discussed/defined:

  • ~~What's the plan to get any data for those fields for the Component model? Most values for those fields come from Package scanning. https://github.com/nexB/dejacode/issues/63#issuecomment-2159432240~~
  • ~~Discuss the current Package.declared_license and Component.concluded_license fields~~
  • ~~Revisit the ComponentAdmin form layout (should be aligned with Package)~~
  • ~~Which of the new fields should be added to the main UI Package and Component forms? (most of those are generated during a scan and not meant to be manually defined or edited)~~
  • ~~Discuss the lack of support for those fields in the AboutCode Spec and tools https://aboutcode.readthedocs.io/projects/aboutcode-toolkit/en/latest/specification.html https://github.com/nexB/aboutcode-toolkit/issues/563~~

TODO:

  • What should we do with the table of the "License" tab, it currently represents the licenses available in the license_expression field. The layout may need to be refined following the display of the new fields.

tdruez avatar Jun 10 '24 13:06 tdruez

@tdruez I have replied to, and mostly provided suggested resolutions for, your comments in the design document.

DennisClark avatar Jun 10 '24 22:06 DennisClark

Regarding "What's the plan to get any data for those fields for the Component model? Most values for those fields come from Package scanning."

On Component, the new fields will get values from manual editing (except for the SPDX-related ones) or import or API. It would be good to copy these fields from Package to Component when a user creates a new Component from a Package.

We might also want to consider some kind of data migration that populates existing Components with assigned packages using the related package fields, but I'm not sure that we are ready to sign up for that right now.

DennisClark avatar Jun 10 '24 22:06 DennisClark

Merged and deployed.

tdruez avatar Jul 03 '24 12:07 tdruez