scancode.io
scancode.io copied to clipboard
SCIO does not identify the codebase source (the path) of a license detection
A recent scan of an FFmpeg project in SCIO returned a composite license expression that included AND proprietary-license in the various licenses, and that was totally incorrect, as there was no object in the codebase under any proprietary license. Refer to https://github.com/nexB/scancode-toolkit/issues/3504 for a related problem.
The big issue here is that I could not find any way, either in the SCIO UI, or in the exported scan results, to identify the actual file (complete path name) that triggered the erroneous detections. The exported scan results only include the following:
{
"score": 100.0,
"matcher": "2-aho",
"end_line": 4182,
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/proprietary-license_489.RULE",
"start_line": 4182,
"matched_text": " license=\"nonfree and unredistributable\"",
"match_coverage": 100.0,
"matched_length": 4,
"rule_relevance": 100,
"rule_identifier": "proprietary-license_489.RULE",
"license_expression": "proprietary-license"
},
{
"score": 100.0,
"matcher": "2-aho",
"end_line": 101,
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/proprietary-license_490.RULE",
"start_line": 101,
"matched_text": " --enable-nonfree allow use of nonfree code, the resulting libs",
"match_coverage": 100.0,
"matched_length": 2,
"rule_relevance": 100,
"rule_identifier": "proprietary-license_490.RULE",
"license_expression": "proprietary-license"
}
There are problems with those rules that are addressed in the SCTK issue, but the only way I could investigate the problem was to download the actual FFmpeg project and search for the files that contained the the matched_text myself. That information should have been in both the scan results and presented in some logical way in the SCIO UI. Consider the simple use case of an analyst seeing a generated license expression in SCIO and wondering where in the code the associated licenses were actually detected.
I am assuming that SCTK actually has the path name but it is not being captured by SCIO; if that is not the case, then this issue needs to be raised upstream in SCTK as well.
Initially assigning this to @AyanSinhaMahapatra but feel free to re-assign if appropriate.
Ack @DennisClark , note that this would be implemented as apart of https://github.com/nexB/scancode.io/issues/733
Initially assigning this to @AyanSinhaMahapatra
Yup, this is high on the priority. I'll create the models and updates to views. We can improve the UI for license detections view later, possibly with https://github.com/nexB/scancode.io/pull/450
@AyanSinhaMahapatra if you do not have time for this, perhaps this issue is a candidate for assigning to a student or volunteer.
@DennisClark
update: we have added a new attribute from_file
in SCTK matches, which was needed to implement this feature correctly wrt. referenced matches: https://github.com/nexB/scancode-toolkit/pull/3620/
I'll take a shot at this soon enough :+1: