Scanning individual layer produces partial results when extracted
What happened:
I executed Syft against a directory containing the extracted content of a layer tar (i.e. one layer.tar out of a docker save) and noticed the results are missing the global node_modules (the layer is the base image of node:16.13.1-alpine), i.e. /usr/local/lib/node_modules/npm/node_modules/**/package.json
What you expected to happen:
The global node_modules are reported.
How to reproduce it (as minimally and precisely as possible):
docker save node:16.13.1-alpine > image.tar
tar xzf image.tar
syft 794de7166e7179b9742b9dc200462acd00c6072e1926319a1f89b7c50665596c/layer.tar <- produces global node_modules
cd 794de7166e7179b9742b9dc200462acd00c6072e1926319a1f89b7c50665596c; tar xzf layer.tar; syft . <- doesn't produce the global node_modules
Anything else we need to know?:
Environment:
- Output of
syft version: syft 0.54.0 - OS (e.g:
cat /etc/os-releaseor similar): reproduces in multiple OS. one of them is alpine:3.15.
bump -- a workaround can be nice too :)
Hi @leoncider sorry for the delay here -- currently, there are 2 different modes which Syft operates: container scan or directory scan, these have a different set of catalogers that run in either case. In a directory scan Syft is looking for source files, which generally include package-lock.json or yarn.lock files, but it skips package.json files, as these would be redundant when found in a node_modules directory, for example.
The example you've presented has different findings because the first (when scanning an archive) is treated like a container scan, whereas extracting it to a directory is treated as a directory scan.
I'd like to provide some sort of workaround, but I would like to better understand your use case. Why do you need to do a directory scan to get this information?
We're going to close this issue as there hasn't been a response in a few months, but please reopen if there is more to be done here!