dejacode icon indicating copy to clipboard operation
dejacode copied to clipboard

Enhancement request: Trigger `populate_purldb` pipeline in ScanCode.io on SBOM import

Open rogu-beta opened this issue 7 months ago • 1 comments

Is your enhancement request related to a problem? Please describe. Currently DejaCode has trouble getting all necessary information for scanning package that have been imported from SBOMs. Most often not enough information are present to deduce the download URL. For references, see issues such as:

  • https://github.com/aboutcode-org/dejacode/issues/121
  • https://github.com/aboutcode-org/dejacode/issues/258
  • https://github.com/aboutcode-org/dejacode/issues/256

The issue stems from the fact the translation from PURL to URL is not always directly possible and may thus not be supported by purl2url and/or DejaCode is not using purl2url in all situations (it is used when manually adding a package but seemingly not when importing through SBOM). For example more elaborate logic is needed for Maven to determine what file needs to be downloaded, which may not be immediately clear from the PURL as the file could have a variety of potential file extensions. This issue can be addressed by using DejaCode, ScanCode.io, and PurlDB, where PurlDB is pulling the metadata of the package, which can then be used to enhance the package information in DejaCode.

Currently the step of gathering metadata for an SBOM requires manually creating an additional pipeline in ScanCode.io to run populate_purldb for the imported SBOM. My suggestion would be to offer an additional option for the SBOM import so that DejaCode triggers this pipeline automatically after running load_sbom. In a completely perfect solution, DejaCode would also allow to run "Improve Package from PurlDB" automatically, once PurlDB has concluded the gathering of metadata. This is needed to assign the download URLs to the package, otherwise the scanning of packages cannot be started

Note: For this to work, ScanCode.io first has to fix the bug in https://github.com/aboutcode-org/scancode.io/issues/1644 where populate_purldb fails if any dependencies are listed in the SBOM. This currently requires to manually edit the SBOM file and remove the dependencies array in the JSON file.

What are the benefits of the requested enhancement?

  • Users can import SBOM and have packages analyzed straight from DejaCode, without requiring additional manual steps or access to ScanCode.io
  • Allows fully automated CI/CD integration, where a product is created, SBOM uploads, metadata gathered and then all packages get scanned

Describe the solution you would like

  • Extend the project that is being created in ScanCode.io by a populate_purldb pipeline after the load_sbom pipeline
  • Run "Improve Packages from PurlDB" once the PurlDB has gather the metadata (unclear if this is technically possible)
  • Once the metadata has been added to the packages continue with scanning the packages
  • Provide options in the web view and API to configure if these two options should be enabled for an SBOM import

Additional notes This feature together with the fix for ScanCode.io would greatly help to integrate this an automated CI workflow.

rogu-beta avatar May 14 '25 08:05 rogu-beta

For context, see also https://github.com/aboutcode-org/dejacode/discussions/289

rogu-beta avatar May 14 '25 08:05 rogu-beta