scancode.io
scancode.io copied to clipboard
Scan package files and extract for packages
In all the following pipelines:
- rootfs
- docker
- docker-windows
- scan_codebase
when we scan files for license, copyright and others, we are skipping the scan for codebase resources which have a status already before this step, and so anything tagged as
application-package
orsystem-package
will not be scanned.
In the match_not_analyzed_to_system_packages
pipe of the rootfs pipeline, we are matching all codebase resources which are a part of that package to the discovered package object and also updating it's status to system-package
. (It seems like earlier we were also doing this for application packages with the match_not_analyzed_to_application_packages
function, but this is not used anywhere after this)
Similary in the docker pipelines, in the create_system_package
function of the collect_and_create_system_packages
step we are updating the status of package files to system-package
.
We can either:
- stop tagging the status of files which are part of a system-package
- or re-scan all package files tagged as system/application package
In this PR I've tried out the 2. approach, as this is what we do in SCTK also, but here we have to create a new argument update_status
and pass it on to the function which saves data to resources after the scan to not overwrite the system-package
or application-package
status for codebase-resources to scanned
, which was a side-effect of the file scans.
Since all these pipelines already did scan application package files (which were not metadata files/lockfiles) I'm assuming we also want to scan the metadata files which were not being scanned? Otherwise #762 does not make any sense. Note here that license scans which are part of a package scan (parsing the manifest and then only running license detection on the extracted part) can be different in some complex files than a simple license scan of the file, and we might need to improve how we handle this in SCTK to avoid confusion. See https://github.com/nexB/scancode-toolkit/issues/3024 for details
Reference: https://github.com/nexB/scancode.io/issues/762 Reference: https://github.com/nexB/scancode.io/issues/1194 Reference: https://github.com/nexB/scancode.io/issues/83