buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

contenthash: implement proper Linux symlink semantics

Open cyphar opened this issue 1 year ago • 6 comments

This series fixes the symlink following implementation of cache/contenthash. The primary issue with the previous implementation is that it assumed that path.Join is a reasonable way of implementing trailing symlink following -- this is not correct on Linux. path.Clean is a Plan9-ism, and while it is useful, on Linux symlinks are resolved left-to-right and thus lexically transforming a/b/../c to a/c is incorrect if b is a symlink. The paper describing Plan9's path_clean makes reference to this Unix-ism they fixed by removing symlinks from Plan9.

As the trailing symlink logic was written in quite a few places, all of the implementations needed to be fixed. The headline changes are:

  • All of the symlink following implementations are now iterative, based on https://github.com/cyphar/filepath-securejoin. Because of the requirement to apply each component in a symlink individually, recursive implementations end up with a lot of wasted work.
  • The follow flag now means followTrailing (O_NOFOLLOW) everywhere. Previously, getFollowLinks and cc.checksum used it to mean "don't follow any links" (a-la RESOLVE_NO_SYMLINKS) while cc.Checksum and cc.checksumFollow used it to mean O_FOLLOW. There was only one user of the RESOLVE_NO_SYMLINKS implementation (cc.includedPaths) and it appears it was only used as an optimisation (when in fact, since we are iterating over the keys in the cache, the key definitely exists, cc.checksum will just do a simple lookup regardless of RESOLVE_NO_SYMLINKS).
  • cc.checksumFollow is gone (now that checksum supports followTrailing directly), and cc.checksumNoFollow and cc.lazyChecksum have been renamed and reworked slightly. Removing the loop in checksumFollow also means we will detect symlink loops 255 times faster :smile_cat:, not to mention that we now don't do a bunch of unnecessary scanning and checksumming when dealing with trailing symlinks.
  • needsScan now just uses getFollowLinksCallback to track the status of a lookup to see if there is a non-symlink path component that has already been scanned in the cache. This removes one extra implementation of symlink lookups.
  • scanPath now does a scan of path.Dir of the resolved path, rather than resolving the lexical directory of the requested path (this means scanPath, needsScan, and thus rootPath now also take followTrailing). I suspect that always fully-resolving the path would be okay (because we fill in symlinks during resolution) but in the case of followTrailing=false it seems unclear to me which directory "should" be scanned. The old implementation would scan all of them (because checksumFollow would re-apply the trailing link and re-checksum everything).

Signed-off-by: Aleksa Sarai [email protected]

cyphar avatar May 02 '24 07:05 cyphar

For reference, here is a Dockerfile which reliably triggered this issue:

FROM alpine:3.19 AS base
USER root
RUN mkdir -p /target/dir /links1 /links2 ; \
	touch /target/dir/foo ; \
	ln -s ../target/dir  /links1/dir_rel ; \
	ln -s /target/dir    /links1/dir_abs ; \
	ln -s ../target  /links1/target_rel ; \
	ln -s /target    /links1/target_abs ; \
	ln -s ../links1/ /links2/links1_rel ; \
	ln -s /links1/ /links2/links1_abs

FROM scratch as test_0
COPY --from=base /target/dir /data/0
COPY --from=base /links1/dir_rel /data/1
COPY --from=base /links1/dir_abs /data/2
COPY --from=base /links1/target_rel/dir /data/3
COPY --from=base /links1/target_abs/dir /data/4
COPY --from=base /links2/links1_rel/dir_abs /data/5
COPY --from=base /links2/links1_abs/dir_abs /data/6
COPY --from=base /links2/links1_rel/target_abs/dir /data/7
COPY --from=base /links2/links1_abs/target_abs/dir /data/8

# The following COPY commands fail.
FROM scratch as test_1
COPY --from=base /links2/links1_rel/dir_rel /data/9
COPY --from=base /links2/links1_abs/dir_rel /data/a
COPY --from=base /links2/links1_rel/target_rel/dir /data/b
COPY --from=base /links2/links1_abs/target_rel/dir /data/c

cyphar avatar May 02 '24 12:05 cyphar

The getFollowParentLinks thing that exists in one patch and is removed in another is a bit unfortunate but I don't see a nice way of splitting the fixes to getFollowLinks and the followTrailing logic without making one mega patch. If you'd prefer that, let me know and I'll squash things down.

cyphar avatar May 02 '24 15:05 cyphar

Maybe I should get a dummy PR merged so you don't need to re-trigger the CI :sweat_smile:.

cyphar avatar May 06 '24 19:05 cyphar

Not gotten a chance to dive into this yet, but just a quick comment - I wonder if we need any changes in https://github.com/tonistiigi/fsutil as well? There's some logic there for following symlinks as well, and other path resolution-y things.

jedevc avatar May 07 '24 11:05 jedevc

Yeah, I looked at fsutil earlier and it does have bugs with symlink resolution (in short, anything that does filepath.Join(foo, bar.Linkname) or filepath.Clean(bar.Linkname) probably has bugs).

However, it wasn't clear to me how it's used by buildkit/docker/containerd. I can write some patches, but if they have subtle semantics (like getFollowLinks does in buildkit) it might be difficult to figure out whether or not I'm breaking something.

(It's a little frustrating that I spent so much time getting RESOLVE_IN_ROOT merged but we can't use it. I really need to pick up the leftover stuff in https://github.com/openSUSE/libpathrs...)

cyphar avatar May 07 '24 13:05 cyphar

@tonistiigi wdyt? Also, did you want me to work on a PR for fsutil?

cyphar avatar May 20 '24 17:05 cyphar

Windows CI failure in new test

=== Failed
=== FAIL: cache/contenthash TestRootPathSymlinks/resolve("link3/target_abs",followTrailing=true) (0.00s)
    path_test.go:115: 
        	Error Trace:	D:/a/buildkit/buildkit/cache/contenthash/path_test.go:115
        	Error:      	Not equal: 
        	            	expected: "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\TestRootPathSymlinks412891216\\001\\target"
        	            	actual  : "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\TestRootPathSymlinks412891216\\001\\link2\\final"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-C:\Users\RUNNER~1\AppData\Local\Temp\TestRootPathSymlinks412891216\001\target
        	            	+C:\Users\RUNNER~1\AppData\Local\Temp\TestRootPathSymlinks412891216\001\link2\final
        	Test:       	TestRootPathSymlinks/resolve("link3/target_abs",followTrailing=true)
    --- FAIL: TestRootPathSymlinks/resolve("link3/target_abs",followTrailing=true) (0.00s)

=== FAIL: cache/contenthash TestRootPathSymlinks/resolve("link2/target_abs",followTrailing=true) (0.00s)
    path_test.go:115: 
        	Error Trace:	D:/a/buildkit/buildkit/cache/contenthash/path_test.go:115
        	Error:      	Not equal: 
        	            	expected: "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\TestRootPathSymlinks412891216\\001\\target"
        	            	actual  : "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\TestRootPathSymlinks412891216\\001\\link2\\final"
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1 +1 @@
        	            	-C:\Users\RUNNER~1\AppData\Local\Temp\TestRootPathSymlinks412891216\001\target
        	            	+C:\Users\RUNNER~1\AppData\Local\Temp\TestRootPathSymlinks412891216\001\link2\final
        	Test:       	TestRootPathSymlinks/resolve("link2/target_abs",followTrailing=true)
    --- FAIL: TestRootPathSymlinks/resolve("link2/target_abs",followTrailing=true) (0.00s)

tonistiigi avatar Jun 05 '24 00:06 tonistiigi

Okay, it seems that Windows's CreateSymlink syscall changes the path we provide for the link (Z:\link2\link1sub\..\final becomes Z:\link2\final when we create the symlink), making the tests fail. Since we can't even create these kinds of symlinks on Windows, I've added a check to skip the test if the symlink we get is not the one we expected on Windows.

cyphar avatar Jun 05 '24 02:06 cyphar