scancode.io
scancode.io copied to clipboard
Review how and which archive to extract in pipelines
There are a few ways we deal with archives both at the pipeline input level and inside pipelines and many style of archives. We should review and ensure we are using a consistent approach across pipes and pipelines for these
- docker images
- package archives
- rootfs/VM images
Also we are not doing much of anything beyond a shallow extraction of a project input which means we may miss out on certain metadata in some cases.
This is in the context of these issues and PRs:
- https://github.com/nexB/scancode-toolkit/issues/14
- https://github.com/nexB/extractcode/issues/27
- https://github.com/nexB/extractcode/issues/6
(This is a follow up from https://github.com/nexB/scancode.io/pull/181/files#diff-56006e2ba488ba623c840bef5b9f94e07e9a37381af5befa665add2f2f6e13faR64 )