pip-audit icon indicating copy to clipboard operation
pip-audit copied to clipboard

Evaluate tools for introspecting container images

Open tetsuo-cpp opened this issue 4 years ago • 5 comments

The syft tool supports generating a SBOM for a container image and has support for Python packages. We should check to see if we can leverage this to support container images in pip-audit.

cc: @di

tetsuo-cpp avatar Oct 19 '21 05:10 tetsuo-cpp

I think syft could be really useful for us. It has quite a bit of functionality for both container images and filesystems and supports a bunch of different language ecosystems. The relevant bits for us are:

  • We can give syft a container image.
  • It will traverse the layers in the image and look for files that look like egg or wheel metadata.
  • If it finds egg or wheel metadata for a package, that package goes into the package list.

Interesting files are:

  • https://github.com/anchore/syft/blob/main/syft/source/all_layers_resolver.go
  • https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/python/package_cataloger.go

Some potential issues:

  • syft just looks for package metadata on the file system. So if I have a container that has a wheel on the filesystem that hasn't been installed to any Python, it's still going to end up in the package list. I initially thought this was weird, but after thinking about it more, auditing a container is a bit of a fuzzy idea since it can have multiple Python environments in it. So just auditing anything on the file system that looks like a package isn't that unreasonable.
  • Calling the Go functionality via C FFI doesn't seem realistic so I imagine we'll have to invoke syft via subprocess. We should probably talk to the devs and figure out whether we can rely on any of the output formats to remain stable since we'll have to parse it in pip-audit and get a list of dependencies out of it.
  • Distribution could be an issue. There aren't builds for all common platforms (for example, there's no build that I can use on my M1 Mac). We might have to say something like: "if you want to audit docker images, make sure syft is in your PATH" and just leave it to the user.
  • Seems unlikely that this functionality could ever make its way back into pip.

tetsuo-cpp avatar Oct 19 '21 06:10 tetsuo-cpp

Thinking about how this compares to the alternatives:

Other options

I'll keep an eye out but I wasn't able to find anything that fits the bill. Tern is interesting but it seems more focused on packages installed via the distro package manager.

Reimplementing in pip-audit

The Python-specific code in syft looks ok but I think the most painful thing about reimplementing this functionality in pip-audit will be parsing the Docker image, traversing over each layer, etc. syft does this by using stereoscope.

I had an idea that it might be possible to leverage some of Tern's image parsing modules (also in Python) for this purpose and write the Python-specific parts on top of it. I'm not sure whether Tern was designed to be used as a library in the way that I'm thinking and there seems to be some platform support issues which might affect us.

tetsuo-cpp avatar Oct 19 '21 07:10 tetsuo-cpp

A further note in terms of reimplementing: Docker's Python SDK is pretty well-featured, and includes a low-level API that might be able to do the kind of image introspection we need.

Edit: It looks like Tern uses the Docker Python SDK:

  • https://github.com/tern-tools/tern/blob/87e7cdd154bc3cad98db1174b192ab9592adcffb/tern/analyze/default/container/multi_layer.py#L85-L120

  • https://github.com/tern-tools/tern/blob/87e7cdd154bc3cad98db1174b192ab9592adcffb/tern/analyze/default/container/image.py

woodruffw avatar Oct 19 '21 14:10 woodruffw

I think the ideal tool would:

  • not require the Docker daemon to be present
  • would be python-importable or maybe callable from the C FFI

AFAICT the Docker daemon seems to be a requirement for Tern but not for stereoscope. I think we want something like stereoscope, but written in Python.

di avatar Oct 19 '21 15:10 di

AFAICT the Docker daemon seems to be a requirement for Tern but not for stereoscope. I think we want something like stereoscope, but written in Python.

Yeah, I believe Docker's Python SDK can't really do anything without connecting to a Docker daemon. So if we don't want to assume the presence of Docker, we probably can't directly dupe or reuse their approach.

I'll do some additional searching for something that looks like stereoscope, but in Python. It might also be possible to write a native Python extension that adapts stereoscope directly, although I'm not familiar with what that looks like with Go (I've done it for Rust and C/C++ and I've used Go extensions, but never written the latter).

woodruffw avatar Oct 19 '21 15:10 woodruffw