pip-audit
pip-audit copied to clipboard
Evaluate tools for introspecting container images
The syft tool supports generating a SBOM for a container image and has support for Python packages. We should check to see if we can leverage this to support container images in pip-audit.
cc: @di
I think syft could be really useful for us. It has quite a bit of functionality for both container images and filesystems and supports a bunch of different language ecosystems. The relevant bits for us are:
- We can give
syfta container image. - It will traverse the layers in the image and look for files that look like egg or wheel metadata.
- If it finds egg or wheel metadata for a package, that package goes into the package list.
Interesting files are:
- https://github.com/anchore/syft/blob/main/syft/source/all_layers_resolver.go
- https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/python/package_cataloger.go
Some potential issues:
syftjust looks for package metadata on the file system. So if I have a container that has a wheel on the filesystem that hasn't been installed to any Python, it's still going to end up in the package list. I initially thought this was weird, but after thinking about it more, auditing a container is a bit of a fuzzy idea since it can have multiple Python environments in it. So just auditing anything on the file system that looks like a package isn't that unreasonable.- Calling the Go functionality via C FFI doesn't seem realistic so I imagine we'll have to invoke
syftviasubprocess. We should probably talk to the devs and figure out whether we can rely on any of the output formats to remain stable since we'll have to parse it inpip-auditand get a list of dependencies out of it. - Distribution could be an issue. There aren't builds for all common platforms (for example, there's no build that I can use on my M1 Mac). We might have to say something like: "if you want to audit docker images, make sure
syftis in yourPATH" and just leave it to the user. - Seems unlikely that this functionality could ever make its way back into
pip.
Thinking about how this compares to the alternatives:
Other options
I'll keep an eye out but I wasn't able to find anything that fits the bill. Tern is interesting but it seems more focused on packages installed via the distro package manager.
Reimplementing in pip-audit
The Python-specific code in syft looks ok but I think the most painful thing about reimplementing this functionality in pip-audit will be parsing the Docker image, traversing over each layer, etc. syft does this by using stereoscope.
I had an idea that it might be possible to leverage some of Tern's image parsing modules (also in Python) for this purpose and write the Python-specific parts on top of it. I'm not sure whether Tern was designed to be used as a library in the way that I'm thinking and there seems to be some platform support issues which might affect us.
A further note in terms of reimplementing: Docker's Python SDK is pretty well-featured, and includes a low-level API that might be able to do the kind of image introspection we need.
Edit: It looks like Tern uses the Docker Python SDK:
-
https://github.com/tern-tools/tern/blob/87e7cdd154bc3cad98db1174b192ab9592adcffb/tern/analyze/default/container/multi_layer.py#L85-L120
-
https://github.com/tern-tools/tern/blob/87e7cdd154bc3cad98db1174b192ab9592adcffb/tern/analyze/default/container/image.py
I think the ideal tool would:
- not require the Docker daemon to be present
- would be python-importable or maybe callable from the C FFI
AFAICT the Docker daemon seems to be a requirement for Tern but not for stereoscope. I think we want something like stereoscope, but written in Python.
AFAICT the Docker daemon seems to be a requirement for Tern but not for stereoscope. I think we want something like stereoscope, but written in Python.
Yeah, I believe Docker's Python SDK can't really do anything without connecting to a Docker daemon. So if we don't want to assume the presence of Docker, we probably can't directly dupe or reuse their approach.
I'll do some additional searching for something that looks like stereoscope, but in Python. It might also be possible to write a native Python extension that adapts stereoscope directly, although I'm not familiar with what that looks like with Go (I've done it for Rust and C/C++ and I've used Go extensions, but never written the latter).