buildah
buildah copied to clipboard
copier/copier.go: switch from Walk to WalkDir breaks COPY . /context from CIFS-mounted build context with symlinks
Description Upgrading an RHEL8.6 box from podman 4.0.2 -> podman 4.1.1 and buildah 1.24.2 -> buildah 1.26.2 broke a previously working container build. The problem seems to stem from using symlinks and CIFS, but buildah's changed behaviour is now causing an error that was previously not a problem.
Buildah is running inside a vagrant-provisioned RHEL8.6 VM, with the build context in a CIFS mount (of a shared NTFS folder from the host running Windows 10 21H2).
There is a symlink in the build context, stored as an NTFS reparse point (created using ln -s in WSL2 on the host) and showing as a symlink when running ls -l on the mounted folder in the RHEL8.6 VM. The problem happens whether the 'mfsymlinks' mount option is enabled or not (vagrant defaults to enabling it).
With the 'mfsymlinks' mount option enabled, the VM can run ln -s and create a 'symlink' that is stored on the host NTFS as a 'magic' file. The problem still happens with these symlinks. With the 'mfsymlinks' option disabled, running ln -s from the VM fails (I don't think Samba supports the creation of reparse-point style symlinks).
A git bisect identified commit 985eec53912a5afaabe3a4abdfa054231f03ce38:
Switch most calls to filepath.Walk to filepath.WalkDir Should speed up most walks escpecially if they don't need to stat every directory entry.
Debugging with strace shows:
- readdir returns d_type=DT_REG for the symlink "b":
getdents64(10, [{d_ino=844424930617637, d_off=4, d_reclen=24, d_type=DT_DIR, d_name=".."}, {d_ino=2251799813736914, d_off=5, d_reclen=40, d_type=DT_REG, d_name=".dockerignore"}, {d_ino=10977524091760767, d_off=6, d_reclen=24, d_type=DT_REG, d_name="a"}, {d_ino=11258999068604971, d_off=7, d_reclen=24, d_type=DT_REG, d_name="b"}, {d_ino=2251799813737072, d_off=8, d_reclen=32, d_type=DT_REG, d_name="Dockerfile"}], 8192) = 144
- stat returns st_mode=S_IFLNK for the symlink "/b":
newfstatat(AT_FDCWD, "/b", {st_dev=makedev(0, 0x65), st_ino=11258999068604971, st_mode=S_IFLNK|000, st_nlink=1, st_uid=65534, st_gid=0, st_blksize=16384, st_blocks=0, st_size=0, st_atime=1663495436 /* 2022-09-18T10:03:56.734495800+0000 */, st_atime_nsec=734495800, st_mtime=1663495436 /* 2022-09-18T10:03:56.734495800+0000 */, st_mtime_nsec=734495800, st_ctime=1663495436 /* 2022-09-18T10:03:56.736447900+0000 */, st_ctime_nsec=736447900}, AT_SYMLINK_NOFOLLOW) = 0
- The WalkDir code eliminated the stat() call made by the Walk code
- As a result the symlink is not identified as a symlink, and its target is not initialised
- The call to symlinkat is made with an empty target and fails:
symlinkat("", AT_FDCWD, "/b") = -1 ENOENT (No such file or directory)
Looking at the cifs module source briefly, it seems the decision to return DT_REG to readdir(), but S_IFLNK to stat() was deliberate, and motivated by performance:
https://github.com/torvalds/linux/blob/e3f259d33c0ebae1b6e4922c7cdb50e864c81928/fs/cifs/readdir.c#L195 https://github.com/torvalds/linux/blob/e3f259d33c0ebae1b6e4922c7cdb50e864c81928/fs/cifs/readdir.c#L212 https://github.com/torvalds/linux/blob/e3f259d33c0ebae1b6e4922c7cdb50e864c81928/fs/cifs/readdir.c#L278 https://github.com/torvalds/linux/blob/e3f259d33c0ebae1b6e4922c7cdb50e864c81928/fs/cifs/readdir.c#L1029
Finally, readdir(3) notes that d_type is not supported by all filesystem types:
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
Steps to reproduce the issue:
- Set up environment
- Mount /vagrant using CIFS from Windows 10 NTFS share. (Optionally add 'mfsymlinks' mount option.)
mount -o rw,relatime,context=system_u:object_r:container_file_t:s0,vers=3.1.1,cache=strict,username=...,domain=...,uid=100081,noforceuid,gid=1000,noforcegid,addr=10.0.0.1,file_mode=0775,dir_mode=0775,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,_netdev //10.0.0.1/share /vagrant
- Construct test case:
mkdir /vagrant/t
cd /vagrant/t
echo 'Dockerfile' > .dockerignore
echo 'FROM busybox' > Dockerfile
echo 'COPY . /data' >> Dockerfile
echo 'Hello, World!' > a
ln -s a b # Must run under WSL on host, unless 'mfsymlinks' option is enabled
- Attempt Build:
buildah build --log-level debug /vagrant/t
Describe the results you received:
...
STEP 2/2: COPY . /data
DEBU[0002] COPY []string(nil), imagebuilder.Copy{FromFS:false, From:"", Src:[]string{"."}, Dest:"/data", Download:false, Chown:"",
Chmod:""}
DEBU[0003] time="2022-09-19T09:41:46Z" level=debug msg="Skipping excluded path: Dockerfile"
DEBU[0003] Error building at step {Env:[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:copy Args:[. /data] Flags:[] Attrs:map[] Message:COPY . /data Original:COPY . /data}: error storing "/vagrant/t": error during bulk transfer for copier.request{Request:"PUT", Root:"/", preservedRoot:"/home/vagrant/.local/share/containers/storage/overlay/af63181625faa5a58a039728676468b91616a67e0666ed66d1ba44b3668cde2d/merged/data", rootPrefix:"/home/vagrant/.local/share/containers/storage/overlay/af63181625faa5a58a039728676468b91616a67e0666ed66d1ba44b3668cde2d/merged/data", Directory:"/", preservedDirectory:"/home/vagrant/.local/share/containers/storage/overlay/af63181625faa5a58a039728676468b91616a67e0666ed66d1ba44b3668cde2d/merged/data", Globs:[]string{}, preservedGlobs:[]string{}, StatOptions:copier.StatOptions{CheckForArchives:false, Excludes:[]string(nil)}, GetOptions:copier.GetOptions{UIDMap:[]idtools.IDMap(nil), GIDMap:[]idtools.IDMap(nil), Excludes:[]string(nil), ExpandArchives:false, ChownDirs:(*idtools.IDPair)(nil), ChmodDirs:(*fs.FileMode)(nil), ChownFiles:(*idtools.IDPair)(nil), ChmodFiles:(*fs.FileMode)(nil), StripSetuidBit:false, StripSetgidBit:false, StripStickyBit:false, StripXattrs:false, KeepDirectoryNames:false, Rename:map[string]string(nil), NoDerefSymlinks:false, IgnoreUnreadable:false, NoCrossDevice:false}, PutOptions:copier.PutOptions{UIDMap:[]idtools.IDMap{}, GIDMap:[]idtools.IDMap{}, DefaultDirOwner:(*idtools.IDPair)(0xc0002d9d60), DefaultDirMode:(*fs.FileMode)(nil), ChownDirs:(*idtools.IDPair)(nil), ChmodDirs:(*fs.FileMode)(nil), ChownFiles:(*idtools.IDPair)(nil), ChmodFiles:(*fs.FileMode)(nil), StripSetuidBit:false, StripSetgidBit:false, StripStickyBit:false, StripXattrs:false, IgnoreXattrErrors:false, IgnoreDevices:true, NoOverwriteDirNonDir:false, NoOverwriteNonDirDir:false, Rename:map[string]string(nil)}, MkdirOptions:copier.MkdirOptions{UIDMap:[]idtools.IDMap(nil), GIDMap:[]idtools.IDMap(nil), ChownNew:(*idtools.IDPair)(nil), ChmodNew:(*fs.FileMode)(nil)}, RemoveOptions:copier.RemoveOptions{All:false}}: copier: put: error creating "/b": symlink /b: no such file or directory
error building at STEP "COPY . /data": error storing "/vagrant/t": error during bulk transfer for copier.request{Request:"PUT", Root:"/", preservedRoot:"/home/vagrant/.local/share/containers/storage/overlay/af63181625faa5a58a039728676468b91616a67e0666ed66d1ba44b3668cde2d/merged/data", rootPrefix:"/home/vagrant/.local/share/containers/storage/overlay/af63181625faa5a58a039728676468b91616a67e0666ed66d1ba44b3668cde2d/merged/data", Directory:"/", preservedDirectory:"/home/vagrant/.local/share/containers/storage/overlay/af63181625faa5a58a039728676468b91616a67e0666ed66d1ba44b3668cde2d/merged/data", Globs:[]string{}, preservedGlobs:[]string{}, StatOptions:copier.StatOptions{CheckForArchives:false, Excludes:[]string(nil)}, GetOptions:copier.GetOptions{UIDMap:[]idtools.IDMap(nil), GIDMap:[]idtools.IDMap(nil), Excludes:[]string(nil), ExpandArchives:false, ChownDirs:(*idtools.IDPair)(nil), ChmodDirs:(*fs.FileMode)(nil), ChownFiles:(*idtools.IDPair)(nil), ChmodFiles:(*fs.FileMode)(nil), StripSetuidBit:false, StripSetgidBit:false, StripStickyBit:false, StripXattrs:false, KeepDirectoryNames:false, Rename:map[string]string(nil), NoDerefSymlinks:false, IgnoreUnreadable:false, NoCrossDevice:false}, PutOptions:copier.PutOptions{UIDMap:[]idtools.IDMap{}, GIDMap:[]idtools.IDMap{}, DefaultDirOwner:(*idtools.IDPair)(0xc0002d9d60), DefaultDirMode:(*fs.FileMode)(nil), ChownDirs:(*idtools.IDPair)(nil), ChmodDirs:(*fs.FileMode)(nil), ChownFiles:(*idtools.IDPair)(nil), ChmodFiles:(*fs.FileMode)(nil), StripSetuidBit:false, StripSetgidBit:false, StripStickyBit:false, StripXattrs:false, IgnoreXattrErrors:false, IgnoreDevices:true, NoOverwriteDirNonDir:false, NoOverwriteNonDirDir:false, Rename:map[string]string(nil)}, MkdirOptions:copier.MkdirOptions{UIDMap:[]idtools.IDMap(nil), GIDMap:[]idtools.IDMap(nil), ChownNew:(*idtools.IDPair)(nil), ChmodNew:(*fs.FileMode)(nil)}, RemoveOptions:copier.RemoveOptions{All:false}}: copier: put: error creating "/b": symlink /b: no such file or directory
Describe the results you expected:
With problematic commit reverted:
STEP 2/2: COPY . /data
DEBU[0006] COPY []string(nil), imagebuilder.Copy{FromFS:false, From:"", Src:[]string{"."}, Dest:"/data", Download:false, Chown:"", Chmod:""}
DEBU[0007] time="2022-09-19T10:13:58Z" level=debug msg="Skipping excluded path: Dockerfile"
DEBU[0007] added content dir:affec538a007634efb5bb2ac81453e4f88904df3cf2653fe14bef0025b78f728
DEBU[0007] COMMIT
COMMIT
Output of rpm -q buildah or apt list buildah:
buildah-1.26.2-1.module+el8.6.0+15917+093ca6f8.x86_64
Output of buildah version:
buildah version 1.26.2 (image-spec 1.0.2-dev, runtime-spec 1.0.2-dev)
Output of cat /etc/*release:
$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"
Output of uname -a:
$ uname -a
Linux podman-vagrant 4.18.0-372.26.1.el8_6.x86_64 #1 SMP Sat Aug 27 02:44:20 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux
Output of cat /etc/containers/storage.conf:
$ cat /etc/containers/storage.conf | grep -E -v '^(#|$)'
[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
additionalimagestores = [
]
[storage.options.overlay]
mountopt = "nodev,metacopy=on"
[storage.options.thinpool]
@flouthoc PTAL
A friendly reminder that this issue had no activity for 30 days.
@nalind @vrothberg @flouthoc @mtrmac @giuseppe any idea on this one?
Going purely by the description:
- I don’t have a bit of sympathy for a kernel behavior returning
DT_REG“for performance”, if that’s indeed what’s happening (I’m unfamiliar with the CIFS kernel code).DT_UNKNOWNexists just for such purposes. That’s a bug that must be fixed in the kernel, IMHO. - That said, we are calling
d.Info()==lstat(2)immediately after the symlink check. So we could do that before the check, and revert to usinginfo.Mode()instead ofd.Type(), to work around this. If anyone does that work, please write a detailed comment about the need for this so that it doesn’t get optimized away again.
A friendly reminder that this issue had no activity for 30 days.
@nalind WDYT?
A friendly reminder that this issue had no activity for 30 days.