Add support for O_TMPFILE
Description
The following fails with runsc but succeeds w/ crun/runc as well as on the host:
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
int main(void)
{
int fd = open("/tmp", O_DIRECTORY);
printf("%i\n", openat(fd, ".", O_TMPFILE|O_WRONLY, 0600));
return 0;
}
This is a very reduced testcase for a failure I am seeing w/ the latest pam_oath on openSUSE Tumbleweed in a container where they applied their own patch (which introduces this behaviour) in order to fix a CVE (different from upstream).
Steps to reproduce
- Compile
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
int main(void)
{
int fd = open("/tmp", O_DIRECTORY);
printf("%i\n", openat(fd, ".", O_TMPFILE|O_WRONLY, 0600));
return 0;
}
- run testcase, which should return a proper fd, not -1.
runsc version
runsc version release-20241028.0-23-gbcbb6a01e13b-dirty spec: 1.1.0-rc.1
docker version (if using docker)
host: arch: amd64 buildahVersion: 1.37.5 cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
- misc cgroupManager: systemd cgroupVersion: v2 conmon: package: app-containers/conmon-2.1.11 path: /usr/libexec/podman/conmon version: 'conmon version 2.1.11, commit: unknown' cpuUtilization: idlePercent: 95.46 systemPercent: 3.36 userPercent: 1.18 cpus: 8 databaseBackend: sqlite distribution: distribution: gentoo version: "2.17" eventLogger: journald freeLocks: 2043 hostname: TARDIS idMappings: gidmap: null uidmap: null kernel: 6.11.5-gentoo-241023-r1 linkmode: dynamic logDriver: journald memFree: 20015349760 memTotal: 33574137856 networkBackend: netavark networkBackendInfo: backend: netavark dns: package: app-containers/aardvark-dns-1.12.2 path: /usr/libexec/podman/aardvark-dns version: aardvark-dns 1.12.2 package: app-containers/netavark-1.12.2 path: /usr/libexec/podman/netavark version: netavark 1.12.2 ociRuntime: name: crun package: app-containers/crun-1.17 path: /usr/bin/crun version: |- crun version 1.17 commit: 000fa0d4eeed8938301f3bcf8206405315bc1017 rundir: /run/crun spec: 1.0.0 +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL os: linux pasta: executable: /usr/bin/pasta package: net-misc/passt-2024.09.06 version: | pasta 2024.09.06 Copyright Red Hat GNU General Public License, version 2 or later https://www.gnu.org/licenses/old-licenses/gpl-2.0.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. remoteSocket: exists: false path: /run/podman/podman.sock rootlessNetworkCmd: pasta security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: false seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: false serviceIsRemote: false slirp4netns: executable: /usr/bin/slirp4netns package: app-containers/slirp4netns-1.2.0 version: |- slirp4netns version 1.2.0 commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383 libslirp: 4.7.0 SLIRP_CONFIG_VERSION_MAX: 4 libseccomp: 2.5.5 swapFree: 0 swapTotal: 0 uptime: 7h 13m 10.00s (Approximately 0.29 days) variant: "" plugins: authorization: null log:
- k8s-file
- none
- passthrough
- journald network:
- bridge
- macvlan
- ipvlan volume:
- local registries: {} store: configFile: /etc/containers/storage.conf containerStore: number: 3 paused: 0 running: 1 stopped: 2 graphDriverName: overlay graphOptions: overlay.mountopt: nodev graphRoot: /var/lib/containers/storage graphRootAllocated: 1978033311744 graphRootUsed: 1113918722048 graphStatus: Backing Filesystem: extfs Native Overlay Diff: "true" Supports d_type: "true" Supports shifting: "true" Supports volatile: "true" Using metacopy: "false" imageCopyTmpDir: /var/tmp imageStore: number: 108 runRoot: /run/containers/storage transientStore: false volumePath: /var/lib/containers/storage/volumes version: APIVersion: 5.2.5 Built: 1731052209 BuiltTime: Fri Nov 8 08:50:09 2024 GitCommit: "" GoVersion: go1.23.2 Os: linux OsArch: linux/amd64 Version: 5.2.5
uname
Linux TARDIS 6.11.5-gentoo-241023-r1 #1 SMP PREEMPT_DYNAMIC Wed Oct 23 17:53:43 CEST 2024 x86_64 Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz GenuineIntel GNU/Linux
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
No response
gVisor doesn't support O_TMPFILE yet: https://github.com/google/gvisor/blob/3c4b2ad07ce1b830bbfa00b3fdcb7dca0d95782a/pkg/sentry/fsimpl/gofer/filesystem.go#L928-L934 https://github.com/google/gvisor/blob/3c4b2ad07ce1b830bbfa00b3fdcb7dca0d95782a/pkg/sentry/fsimpl/tmpfs/filesystem.go#L340-L343 https://github.com/google/gvisor/blob/3c4b2ad07ce1b830bbfa00b3fdcb7dca0d95782a/pkg/sentry/fsimpl/erofs/filesystem.go#L220-L222
@BinaryKhaos Just curious, in your "pam_oath on openSUSE Tumbleweed" use-case, does the application use O_TMPFILE on /tmp directory (or one of its subdirectories)? I am wondering if we could just add O_TMPFILE support to gVisor tmpfs and unblock you. Support for it in gofer is a bit more involved, but tmpfs should be doable.
@ayushr2 Unfortunately not. It's used in the same (configurable) directory where a specific configuration file is located... usually somewhere in /etc or /home. You can find the patch (and the rest of the code) here, if that helps.
@ayushr2 What about directfs mode? That’s what I expect most non-Google uses of gVisor to be using.
@DemiMarie Directfs is an access mode of the gofer filesystem where the sentry is able to directly access the host filesystem. Adding support for O_TMPFILE in the gofer client (the sentry-side component which includes directfs bits) is a bit involved... But doable. Contributions are appreciated.