scancode.io icon indicating copy to clipboard operation
scancode.io copied to clipboard

RFC: Define and develop scoring elements for SCA Clarity

Open DennisClark opened this issue 11 months ago • 9 comments

We need to define the scoring elements (criteria), and their weighting factors, to evaluate the quality of scan results, working name "SCA Clarity", roughly equivalent to our scoring elements for license clarity on a specific project. To get things started, I would suggest that some major elements would be

element: number-of-exact-licenses-detected description: the number of licenses detected with an exact license key match.

element: number-of-unknown-licenses-detected description: the number of licenses detected with no exact license key match.

element: percentage-of-exact-licenses-detected description: a percentage of all the license detections that identify specific license keys, as opposed to unknown license references where the text is not matched precisely to a known license.

More ideas and comments are welcome

DennisClark avatar Mar 06 '24 17:03 DennisClark

other elements could be:

element: number_of_copyrights_detected description: the number of copyright statements detected in a scan

element: number_of_authors_detected description: the number of authors (contributors) detected in a scan

element: number_of_packages_detected description: the number of packages detected in a scan that can be identified by a valid PURL

DennisClark avatar Mar 06 '24 17:03 DennisClark

we might also add:

element: number_of_dependencies_detected description: the number of dependencies identified by inspecting the files that specify other software (usually third-party) required by the project codebase being scanned

DennisClark avatar Mar 27 '24 00:03 DennisClark

Some comments:

  • The scoring needs to be more sophisticated for the detection of packages or other units of software. This is Task 1 in scanning (and for any SBOM). Software units (programs or source) that are not packages may be somewhat analogous to unknown licenses, but not sure that "unknown packages" is a good name.
  • The scoring for dependencies needs more research to incorporate scope and origin of the dependency (manifest, lock file or other).
  • We might want to call this SCA Clarity.

mjherzog avatar Mar 27 '24 16:03 mjherzog

I like "SCA Clarity". Let's use that term for this.

DennisClark avatar Mar 27 '24 16:03 DennisClark

I think we have enough elements identified now to move ahead with some kind of SCA Clarity support in SCIO.

Should this be a standard feature that does not require setting a specific option when doing the scan/etc ? I think yes, but if there are other thoughts on that, they are welcome here.

DennisClark avatar Apr 01 '24 18:04 DennisClark

We need to order this so that the clarity of the SBOM contents (software units) is scored separately from the clarity of origin and license information for those software units.

mjherzog avatar Apr 02 '24 16:04 mjherzog

a further refinement is probably needed. My original suggestion of element: number_of_packages_detected description: the number of packages detected in a scan that can be identified by a valid PURL

should perhaps be broken down into two types to support container analysis:

element: number_of_system_packages_detected description: the number of packages detected in a scan that can be identified by a valid PURL that originate from a distro or distro repo

element: number_of_application_packages_detected description: the number of packages detected in a scan that can be identified by a valid PURL that do not originate from a distro or distro repo

DennisClark avatar Apr 02 '24 19:04 DennisClark

probably best to do the counting of the data in a new pipeline compute-sca-clarity

DennisClark avatar Apr 03 '24 15:04 DennisClark

we might also add a negative element:

element: number_of_misleading_matches_reported description: the number of matches (snippet or whole file) that are not quite accurate or do not add meaningful value.

DennisClark avatar May 13 '24 23:05 DennisClark