build
build copied to clipboard
File removal in ACI
This issue should perhaps be put on appc/spec, but I'm temporarily putting it here to reduce noise.
As discussed OOB, we were planning to use pathWhitelist to represent file removal. However, a whitelist is much more clumsy to use than a black list for removing files. For instance, if I simply want to remove a single file hello.txt, I would essentially have to add to the whitelist all files and directories that are not hello.txt. This process is not only error-prone, but could also result in a large manifest.
I propose two solutions:
-
The simplest solution is simply to have a
pathBlacklist. Intuitively, the blacklist would specify all files that we want to exclude from the final rendered image. -
We could also borrow ideas from overlayfs:
whiteouts and opaque directories -------------------------------- In order to support rm and rmdir without changing the lower filesystem, an overlay filesystem needs to record in the upper filesystem that files have been removed. This is done using whiteouts and opaque directories (non-directories are always opaque). A whiteout is created as a character device with 0/0 device number. When a whiteout is found in the upper level of a merged directory, any matching name in the lower level is ignored, and the whiteout itself is also hidden. A directory is made opaque by setting the xattr "trusted.overlay.opaque" to "y". Where the upper filesystem contains an opaque directory, any directory in the lower filesystem with the same name is ignored.So in this approach, an ACI that removes files would have the "whiteouts" and "opaque directories" described above in its
rootfs. When rendered, it would remove the corresponding files in the lower layer.
Personally, I like the second approach, as it keeps all the "state" in rootfs, reducing the manifest to the simple metadata file that it was supposed to be. It's also worth noting that the two approaches can coexist.
What are your thoughts? @klizhentas @jonboulle @jzelinskie
I guess the downside to the second approach is that it is coupled pretty tightly to overlayfs.
Are humans going to be reading these manifests by hand?
Are humans going to be reading these manifests by hand?
I think that's something we already do :-) Can't speak for everyone, but I tend to read/write manifests by hand all time.
Whiteouts solution's advantage is that it's more scalable, imagine you've deleted lots of files, in this case manifest would be huge and unreadable, in addition to that tar probably will give a list of files anyway, so putting them in manifest is redundant.
on the other hand, 'character device with 0/0' may be too OS-specific, there should be some additional metadata specifying the sentinel probably.
@jzelinskie I think the second approach is just conceptually related to overlayfs; the actual implementation would not have to depend on overlayfs at all.
@klizhentas Agreed. But it's not just when you delete lots of files. Even if you delete only a single file foo, with only a whitelist, you'd have to specify all files that are not foo, which is just very counter-intuitive.
Glad that @klizhentas brought up the issue with the second approach being OS-specific. Now that I think about it, the second approach is also not backward-compatible, because some ACIs might already contain whiteouts and opaque directories and maybe the original authors actually want them to be there. The blacklist approach, on the other hand, is totally backward-compatible. I guess at this point I'm leaning more towards the first approach.
Would love to hear @jonboulle's opinion on this. Once we've settled on a solution, I'd be happy to move this issue to appc/spec and provide an implementation in rkt.
As discussed OOB, we were planning to use pathWhitelist to represent file removal.
Hmm, I'm not sure I remember this discussion the same way - weren't we discussing squashing resultant images by default?
We have actually discussed this a lot in the past: tl;dr:
- there's no real satisfactory sentinel approach, almost everything is either too OS or FS specific
- whitelist vs. blacklist was essentially a coin flip since there are decent arguments and use cases both ways
We ended up deciding that we would be happy to make it an either/or for blacklist vs. whitelist, there's an issue in appc/spec discussing this: https://github.com/appc/spec/issues/323
Feel free to file a PR :-)
great, seems that choosing blacklists/whitelists will work