containerregistry
containerregistry copied to clipboard
Properly handle opaque whiteout entries in layer flattener
This PR fixes a bug in the flattener's handling of layers with opaque whiteout entries (.wh..wh..opq
files). This bug could lead to deleted files re-appearing in flattened output.
Bug symptoms
According to the OCI spec on whiteouts:
An opaque whiteout entry is a file with the name
.wh..wh..opq
indicating that all siblings are hidden in the lower layer.
The existing flattening logic does not take these files into account, leading to weird anomalies when flattening images where directories are deleted in lower layers and re-created in higher layers. As an example, imagine that I have a Dockerfile which looks something like this:
FROM somewhere
RUN mkdir /directory && touch /directory/old1 /directory/old2
RUN rm -r directory && mkdir /directory && touch /directory/new
Per the spec, these filesystem changes can be represented via the following tar layers:
directory/
directory/old1
directory/old2
and
directory/
directory/.wh..wh..opq
directory/new
The expected flattened layer output is
directory/new
The current flattening implementation does not respect the .wh..wh..opq
file, so the flattened output incorrectly contains both the old and new files:
directory/new
directory/old1
directory/old2
Handling of opaque whiteout files seems to be a problem for other projects, too; see https://github.com/sylabs/singularity/issues/1962 and https://github.com/moby/moby/issues/34300, for example.
Solution
This PR addresses this problem by modifying the client v2.2's flattening function to properly account for opaque whiteout files. The flattener now tracks the set of directories in higher layers which have opaque whiteouts.
Testing
I added a new unit test suite client_v2_2
and wrote tests for the extract()
flattening function.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
:memo: Please visit https://cla.developers.google.com/ to sign.
Once you've signed (or fixed any issues), please reply here (e.g. I signed it!
) and we'll verify it.
What to do if you already signed the CLA
Individual signers
- It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.
Corporate signers
- Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
- The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
- The email used to register you as an authorized contributor must also be attached to your GitHub account.
I consider this a work-in-progress until the following TODOs and questions are resolved:
- [ ] Does a similar change need to be made in the older client versions, or is it okay to change only in V2.2?
- [ ] Does this need to be feature-flagged? Is it possible that anyone was relying on the old, incorrect behavior?
- [x] Add comments to explain the new code in
extract()
. - [x] Add a simple test of file contents (to make sure that the flattener picks the topmost version of a given file).
- [x] Is it possible for a tar to contain an opaque whiteout file for a directory but to add no new files under that directory? In this case, should the directory be present-but-empty or should it be removed? I need to see how other projects handle this corner-case. I may be able to test this behavior by simulating this case with my Docker environment and then ensuring that the flattener output agrees with my Docker output.
- Update: yes. It appears that the right behavior is to preserve the now-empty parent directory. This is consistent with the
bin/
example given at https://github.com/opencontainers/image-spec/blob/7b1e489870acb042978a3935d2fb76f8a79aff81/layer.md#opaque-whiteout. - I added a test for this.
- Update: yes. It appears that the right behavior is to preserve the now-empty parent directory. This is consistent with the
I signed it!
CLAs look good, thanks!
@KaylaNguyen, do you have any time to take a look at this? We ran into this bug again the other day.
Bump on this PR