COPY command changes the directory permissions to change.
Description
In the latest v23.0.0 update, I found that using COPY --link FILE.txt /tmp/FILE.txt caused the permissions of the /tmp directory to change from 777 to 755. This caused problems downstream because apt no longer had write permissions to the /tmp directory. Though this can be fixed by changing the directory that the file was copied to, this is obviously a bug.
Reproduce
git clone https://github.com/cresset-template/cresset.git
cd cresset
git checkout 2fc0889
make env
make build
Expected behavior
I tried adding RUN ls -alh /tmp to the Dockerfile before and after the COPY command in line 418 of the Cresset Dockerfile and the file permissions that I found were different. The COPY command should not change the permissions of the directory.
docker version
Client: Docker Engine - Community
Version: 23.0.0
API version: 1.42
Go version: go1.19.5
Git commit: e92dd87
Built: Wed Feb 1 17:49:08 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 23.0.0
API version: 1.42 (minimum version 1.12)
Go version: go1.19.5
Git commit: d7573ab
Built: Wed Feb 1 17:49:08 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.16
GitCommit: 31aa4358a36870b21a992d3ad2bef29e1d693bec
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client:
Context: default
Debug Mode: false
Plugins:
compose: Docker Compose (Docker Inc.)
Version: v2.15.1
Path: /home/veritas/.docker/cli-plugins/docker-compose
scan: Docker Scan (Docker Inc.)
Version: v0.17.0
Path: /usr/libexec/docker/cli-plugins/docker-scan
Server:
Containers: 4
Running: 3
Paused: 0
Stopped: 1
Images: 19
Server Version: 23.0.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: 31aa4358a36870b21a992d3ad2bef29e1d693bec
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 5.4.0-137-generic
Operating System: Ubuntu 20.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 80
Total Memory: 251.6GiB
Name: blu3-001
ID: 5633:5QID:26UF:2LQP:2JJE:5EQO:KXYX:KNGY:5FLX:DEFQ:NWUK:LUWJ
Docker Root Dir: /data1/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Additional Info
Before COPY:
=> [train 2/12] RUN ls -alh /tmp && sleep 20 8.0s
=> => # total 8.0K
=> => # drwxrwxrwt 2 root root 4.0K Jan 26 02:06 .
=> => # drwxr-xr-x 1 root root 4.0K Feb 8 06:56 ..
After COPY:
=> [train 4/12] RUN ls -alh /tmp && sleep 20 9.3s
=> => # total 12K
=> => # drwxr-xr-x 2 root root 4.0K Feb 8 05:28 .
=> => # drwxr-xr-x 1 root root 4.0K Feb 8 06:57 ..
Thanks for reporting. I think this may be a combination of things, and I think some of this was discussed on https://github.com/moby/buildkit/issues/2414#issuecomment-943325055;
-
COPY --linkimplies starting the layerFROM scratch(seeCOPY --link) -
Using that option allows the content in the layer to be separate from its parent layer (so only information about the content in the layer is preserved)
-
In your example, the content that's copied is
FILE.txt, but (taking the above into account), the/tmpdirectory is not;COPY --link FILE.txt /tmp/FILE.txt -
As the
/tmpdirectory is implicitly created, this may now be using the defaults (some discussion on that in https://github.com/moby/moby/issues/44106, and https://github.com/opencontainers/image-spec/pull/970)
While that may explain what's happening, I agree this is somewhat confusing, and it definitely feels unexpected that the permissions change.
So, if the above is the reason, I wonder what the options are for this 🤔;
One option would be to copy the directory permissions of the parent layer
However, this could potentially defeat the purpose of --link, as it now becomes dependent on the parent layer?
Effectively, this would be making --parents option (not yet available) the default if --link is used. Possibly the COPY --parents option (once available) would take away that ambiguity (i.e., it would allow for the user to opt-in to have the parent directory permissions be "baked" into the layer.
@jedevc @tonistiigi @crazy-max PTAL (should this be moved to the BuildKit issue tracker?)
BuildKit 0.11.2
FROM alpine AS base
RUN apk add --no-cache tree
RUN tree -apufi /tmp
COPY foo.txt /tmp/foo.txt
RUN tree -apufi /tmp
COPY --link foo.txt /tmp/foo.txt
RUN tree -apufi /tmp
...
moby/moby#7 [3/7] RUN tree -apufi /tmp
moby/moby#7 0.136 [drwxrwxrwt root ] /tmp
moby/moby#7 0.136
moby/moby#7 0.136 0 directories, 0 files
moby/moby#7 DONE 0.2s
moby/moby#8 [4/7] COPY foo.txt /tmp/foo.txt
moby/moby#8 DONE 0.1s
moby/moby#9 [5/7] RUN tree -apufi /tmp
#0 0.134 [drwxrwxrwt root ] /tmp
#0 0.134 [-rwxrwxrwx root ] /tmp/foo.txt
#0 0.134
#0 0.134 0 directories, 1 file
moby/moby#9 DONE 0.2s
moby/moby#10 [6/7] COPY --link foo.txt /tmp/foo.txt
moby/moby#10 merging 0.0s done
moby/moby#10 DONE 0.1s
moby/moby#11 [7/7] RUN tree -apufi /tmp
#0 0.125 [drwxr-xr-x root ] /tmp
#0 0.125 [-rwxrwxrwx root ] /tmp/foo.txt
#0 0.125
#0 0.125 0 directories, 1 file
moby/moby#11 DONE 0.1s
should this be moved to the BuildKit issue tracker?
Yes please
I transferred the ticket to the BuildKit issue tracker 👍
As @thaJeztah wrote, --link implies that the full path is linked over the existing source, it is not merged with the previous one. This is one of the reasons --link is a new opt-in flag. You need to make sure that your source is prepared so that it doesn't contain invalid permissions.
You can read the background of this at https://github.com/moby/buildkit/issues/2414 . It is possible that in some future release, we can improve this a bit by removing the parent directory record from the layer tarball when we create the image, but there will always be specific cases (symlinks, chown etc) where doing a linked copy requires these extra precaucions.
This code used to function properly in previous Docker engine versions, is there any reason why it suddenly failed in the latest update?
I wonder (but @tonistiigi may be able to confirm) if that's where these come into play;
- https://github.com/moby/moby/pull/44196
- https://github.com/opencontainers/image-spec/pull/970
If the COPY --link step in this case does not add a directory entry for /tmp, and the "parent" layer is FROM scratch then the permissions for that directory are "undefined"; versions before 23.0.0 of the daemon (see https://github.com/moby/moby/issues/44106) used 0777 and depended on umask, which resulted in undefined behavior (depending on the umask and storage driver used, this could either result in 0755 permissions or 0600).
This code used to function properly in previous Docker engine versions, is there any reason why it suddenly failed in the latest update?
COPY --link was first implemented in BuildKit v0.10 that was added to Moby with v23.0 release. If you used --link in an older release, that didn't do any linking behavior then you get the previous semantics exactly as without any --link flag.
Dockerfiles that opt-in to --link are compatible with previous copy semantics. But if you just take old Dockerfile and add --link to all COPY commands you can get slightly different semantics depending on what source parameters and flags are used.
Oh! Good one, I assumed it depended on the Dockerfile frontend, but didn't consider the option would be previously silently ignored.
I guess it no longer helps for this specific case (now that 23.0 is out), but do you know if warnings are still planned for cases like this? (Dockerfile uses option that's not supported by a specific version of BuildKit) also see https://github.com/moby/buildkit/issues/1952#issuecomment-764816376
I also encounter some unexpected directory permission changes using 23.0.0 with moby worker in buildkit. Instead of using COPY op, what i did is
File(llb.Mkdir("/var/log/xxx", 0777, llb.WithParents(true)), llb.WithCustomNamef("[internal] mkdir for xxx log" ))
But after enter the container, i find the permission of the directory to be 0755, which prevent me for adding log files under the directory. Is there any possible reason or undefined behaviors that can caused this?
@Kaiyang-Chen perhaps default umask (022) being applied to 0777 ?
@Kaiyang-Chen perhaps default umask (
022) being applied to0777?
@thaJeztah In your previous reply, version before 23.0.0 used 0777 for undefined directory, and after it there will be a default umask apply to it? It seems for version before 23.0.0, the directory permission is indeed 0777 using container-based buildkit in above implementation, but when using moby worker, the permission are changed to 0755. Are there any difference between built-in moby worker from buildx and container-based buildkit in such behavious? What's more, in my case, when creating the directory, i explicitly set the permission to 0777, why there is still umask apply to it? Thanks for your help!
Is there any possible reason or undefined behaviors that can caused this?
Are you sure the directory didn't exist before? Mkdir does not overwrite perms for existing directories.
The umask should not apply. If it does, then it is a bug.
The last issue is completely different from the initial report. If you have some reproducers for it, then create a new issue with all info.
The /tmp directory existed before the COPY --link command was executed, I am certain because I ran ls -alh on the /tmp directory to get the outputs. It therefore seems to be a bug to me.
Someone reported a similar issue but without --link: https://github.com/docker/build-push-action/issues/1130#issuecomment-2163185050
The /tmp directory existed before the COPY --link command was executed
This does not matter. --link does not have a dependency on destination directory (and therefore does not invalidate cache if destination changes).
Link is equivalent to:
FROM scratch AS layerX
COPY files /some/dest/
One option would be to copy the directory permissions of the parent layer
However, this could potentially defeat the purpose of
--link, as it now becomes dependent on the parent layer?
Has the findings below with --chmod made --link dependent in some way? Or by parent layer were you referring to the ownership/permissions of the source path segments?
I notice you mentioned permissions specifically, was that also applicable to ownership expectations with --chown?
Link is equivalent to
FROM scratch AS layerX COPY files /some/dest/
What about when --chown / --chmod are used?
From this reference we have this minimal example:
FROM dest AS with-chmod-fix
COPY --link --chown=300 --chmod=null --from=src /foo/bar/baz /foo/bar/baz
Ownership and permissions of the dest stage remains intact:
Octal User Group Name
0770 200 200 /foo
0770 200 200 └── bar
0770 200 200 └── baz
0777 300 300 ├── a
0777 300 300 ├── b
0777 300 300 └── c
If the --chmod=null was removed however.. --chown=300 now also modifies parent directories ownership and the their permissions are modified to 0755:
Octal User Group Name
0755 300 300 /foo
0755 300 300 └── bar
0755 300 300 └── baz
0777 300 300 ├── a
0777 300 300 ├── b
0777 300 300 └── c
Observations:
--chmodwith an actual octal value will only modify the source permissions when copied to the destination. It never affects the parent directories like--chownabove.--chmod=nulltrick does not have any influence at negating the--chownquirk if the destination does not yet exist.
Seems like this inconsistency is a bug? (UPDATE: Full overview documented here)
there will always be specific cases (symlinks, chown etc) where doing a linked copy requires these extra precautions.
Does the --chown or --chmod flags apply here? How can I confirm if the --link functionality is actually working still vs being treated as --link=false?