container-diff icon indicating copy to clipboard operation
container-diff copied to clipboard

Differ/prepper refactor proposal

Open dlorenc opened this issue 8 years ago • 1 comments

This is meant to serve as a discussion point for refactoring differs and preppers. The current design is roughly:

  1. Preppers unpack images from a variety of sources (registry, tarball, daemon) onto disk in temporary directories
  2. Differs consume these temporary directories to generate diff results

This leads to a few problems:

  • Images must be completely unpacked onto disk before any diffing can begin.
  • Many differs do not need to inspect the entire contents of an image, and some (the apt/dpkg differ in particular) only needs to look at a single file.
  • Permissions and file paths are tricky to represent in a cross-platform manner.

I propose that we refactor the interfaces a bit. One possibility would be to allow differs to "register" the set of filepaths (as regexes? lists?) they need to consume, then allow the preppers to provide these filepaths from an image to the differs. Something like:

type Differ interface {
  Filepaths(metadata container.Metadata) []string
  Diff(files map[string]io.Reader
}

container-diff would be responsible for collecting the list of files needed by all differs, then calling 'Diff' with a way to access these files.

Some differs (pip, etc.) might need to inspect the container metadata to determine the set of file paths they need to examine, so that can be provided at the time the filepaths are requested.

This interface would have several benefits:

  • When running differs that only need a small set of files, the preppers can quickly exit without even needing to download the entire image.
  • We can avoid writing images to disk, which should make windows support easier

dlorenc avatar Nov 29 '17 03:11 dlorenc

Nice, yeah I definitely think this is the direction we want to head. We should be able to get significant performance gains just by eliminating the need to process the majority of the image in most scenarios.

+1 to the idea of each differ providing a set of filepaths it needs from the image, I'm a big fan of only taking what you need here.

I think the easiest way to implement this on top of what we have now would be to use a tmpfs mounted in RAM, then switch the root directory to point to that tmpfs and write to "disk" like we do now. The only thing I'm unsure of here is file permissions, but hopefully they just work :)

nkubala avatar Nov 29 '17 17:11 nkubala