cyclonedx-python icon indicating copy to clipboard operation
cyclonedx-python copied to clipboard

feat: Add complete License-Text to cyclonedx bom

Open andife opened this issue 1 year ago • 9 comments

We have as a requirement in the CycloneDX json to include the complete license information/text provided.

My idea would have been to take this directly from the wheel. This contains this information in the *dist-info directory. There is a LICENSE file.

This information can also be accessed via "pip-licenses --with-license-file --format=json".

It would be nice if the designated area could be filled in the cyclonedx format (https://cyclonedx.org/docs/1.4/json/#components_items_licenses_items_license_text_content) for the license file.

andife avatar Aug 28 '23 07:08 andife

I've implemented something like this, although I store the license texts in ComponentEvidence since it's more a result of analysis rather than a guaranteed license. Some packages contain multiple license files, for example. I'll try to upstream this here at some point if it's accepted.

I used pip-licenses as well in the past but moved to this approach for exactly the same reason as stated above :)

nejch avatar Nov 27 '23 14:11 nejch

@nejch better wait with an implementation until the following were properly merged to master:

  • https://github.com/CycloneDX/cyclonedx-python/pull/610
  • https://github.com/CycloneDX/cyclonedx-python/pull/605

jkowalleck avatar Nov 27 '23 14:11 jkowalleck

see #567

jkowalleck avatar Jan 06 '24 22:01 jkowalleck

since v4 was published and released, feel free to contribute this feature.

as explained, the target of the "detected" license texts shall be component.evidence.licenses[] (https://cyclonedx.org/docs/1.5/json/#metadata_component_evidence_licenses)

example outcome:

{
  // ...
  "evidence": {
    "licenses": [
      {
        "name": "detected license text from file XZY",
        "text": {
           "contentType": "text/markdown",
           "encoding": "base64",
           "content": "IyBNSVQgTm8gQXR0cmlidXRpb24KCkNvcHlyaWdodCAyMDI0IEphbmUgRG9lCgpQZXJtaXNzaW9uIGlzIGhlcmVieSBncmFudGVkLCBbLi4uXQ=="
        }
      }
    ]
  } 
}

jkowalleck avatar Feb 02 '24 11:02 jkowalleck

Thanks a lot, that's almost exactly what I have now although so far I didn't encode it:

            "evidence": {
                "licenses": [
                    {
                        "license": {
                            "name": "mkdocs-1.5.3.dist-info/licenses/LICENSE",
                            "text": {
                                "contentType": "text/plain",
                                "content": "Copyright \u00a9 2014-present, Tom Christie. All rights reserved.\n\nRedistribution and use in source and binary forms, with or\nwithout modification, are permitted provided that the following\nconditions are met:\n\nRedistributions of source code must retain the above copyright\nnotice, this list of conditions and the following disclaimer.\nRedistributions in binary form must reproduce the above copyright\nnotice, this list of conditions and the following disclaimer in\nthe documentation and/or other materials provided with the\ndistribution.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND\nCONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES,\nINCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF\nMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\nDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR\nCONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\nSPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\nLIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF\nUSE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED\nAND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\nLIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN\nANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE\nPOSSIBILITY OF SUCH DAMAGE.\n"
                            }
                        }
                    }
                ]
            }

@jkowalleck just wondering, does the current implementation have a mechanism to try and guess the license (spdx or generic name) from a text? I'd then potentially try to reuse it, might be useful to add in the name, even if not really guaranteed.

nejch avatar Feb 02 '24 12:02 nejch

Also note to self: Just noticed this package no longer has a public API. So I guess this functionality should actually go into https://github.com/CycloneDX/cyclonedx-python-lib, if we as users want to use it programmatically.

Edit: maybe not, as that only has the models etc :sweat_smile: hmm.

nejch avatar Feb 02 '24 12:02 nejch

re https://github.com/CycloneDX/cyclonedx-python/issues/570#issuecomment-1923688403 @nejch

[...] although so far I didn't encode it

encoding the license text is not an option, it is mandatory, AFAIK. you ~~need to~~ should get your implementation fixed.

PS: Edit: need to dig into the SBOM guide and check it it is actually required. At least it is what I thought, and I still encourage encoding the content, to prevent issues when embedding the text in the transport media(XML/JSON/ProtoBuff)

[...] a mechanism to try and guess the license (spdx or generic name) from a text?

Nope, not exactly. Guessing license identifiers based on text snippets is nothing that is planned for any python implementation. (This would bloat the library or depend on external services... However, there are tools that can do this already. e.g. https://github.com/CycloneDX/license-scanner) A thing that exists is detecting of license names -- see the library here: cyclonedx.factory.license.LicenseFactory.make_from_string() -- https://cyclonedx-python-library.readthedocs.io/en/latest/autoapi/cyclonedx/factory/license/index.html#cyclonedx.factory.license.LicenseFactory.make_from_string

jkowalleck avatar Feb 02 '24 13:02 jkowalleck

@jkowalleck sure, this was mostly for internal use to display the SBOM angular-style, will ensure it's encoded before going upstream!

Thanks for the hint there, since some license contents start with the name itself as the first line, I'll see if that could be of some use but not 100%.

nejch avatar Feb 02 '24 13:02 nejch

Acceptance criteria

  • the feature to add license texts should be enabled by a CLI switch called --gather-license-evidence (name to be discussed)
  • the feature is disabled per default
  • only if the feature is enabled:
    • license text detection
      • from raw pyproject.toml : done via #692
        • [x] follow project.license.file(string)
        • [x] use project.license.test
      • for wheel
      • for sdist the license text detection should follow the following file patterns:
        • LICEN[CS]E*
        • NOTICE*
    • for all components, meta-components, root-components and nested components: regardless of SPDX license ID, SPDX license expression or named license, the license texts should be added as ...
      • [x] for declared ones in wheel: the elements declared license -> .licenses[] - via #694
      • [ ] for concluded ones in wheel: search for the usual file patterns -> .evidence.licenses[]
      • [ ] for sdist: the elements concluded from -> .evidence.licenses[] Examples:
        {
          //...
          "evidence": { 
            "licenses": [
              {"id":"Apache-2.0", "text": {
                "contentType": "text/plain",
                "encoding": "base64",
                // base64 of content of file `LICENSE`
                "content": "bG9yZW0gaXBzdW0="
              }}
              {"name":"file: NOTICE", "text": {
                "contentType": "text/plain",
                "encoding": "base64",
                // base46 of content of file `NOTICE`
                "content": "bG9yZW0gaXBzdW0="
              }}
            ]
          },
          // ...
        }
        
    • if a license text is detected with the package, it would be added to Component's @.evicence.licenses
      • @.name would be 'License of <PackageName>: '
      • @.text would hold the test
        • the content type is to be derived from file extension
        • the content SHOULD be base64 encoded
    • if no license text is shipped with a package, no license test is added as a evidence. Nope, no license template is derived from package's declared SPDX license id.
      Reason: license templates (like BSD clause 3) are designed to be modified (unlike others, like Apache2, which is not a template but a complete text)

jkowalleck avatar Mar 13 '24 10:03 jkowalleck