gvisor runsc: boot: use a more reliable method to check /proc

Currently we check whether /proc is umounted by 'unix.Access'. If we run runsc as setuid binary using unprivileged user(in our use cases) we may get EPERM even the /proc is umounted as Linux uses uid to perform the check. In this case we get the error '/proc is still accessible'.

Notice this error can only occur combined with following two conditions:

we uses --network host. This is because in non-host mode, the runsc will create a new userns and do uid identity mapping, the child process will be run as 0:0(uid:gid) so the check will be ok
the umask setting should be set to mask the all perm of others such as 0027. This is because if we don't mask the others's read perm, the check will also be ok as the unprivileged uid can also read /proc/ file.

This patch uses a more reliable method to do check /proc check.

Apr 06 '24 03:04 terenceli

@ayushr2 Could you please review this? I see you add this code. Thanks

Apr 10 '24 02:04 terenceli

I am a bit confused, if access(2) fails with EPERM due to lack of read permission for /proc directory, then won't ioutil.ReadDir() (which should use getdents(2) on an open FD) also fail with EPERM when opening a O_RDONLY FD to /proc?

Apr 10 '24 03:04 ayushr2

I am a bit confused, if access(2) fails with EPERM due to lack of read permission for /proc directory, then won't ioutil.ReadDir() (which should use getdents(2) on an open FD) also fail with EPERM when opening a O_RDONLY FD to /proc?

No in the setuid root binary case, the access uses the real uid(which is not 0) to do check, and the open syscall uses the euid(which is 0) to do the check.

Apr 10 '24 10:04 terenceli

No in the setuid root binary case, the access uses the real uid(which is not 0) to do check, and the open syscall uses the euid(which is 0) to do the check.

I see, could you update the description to reflect this. I know you already mention "Linux uses uid to perform the check", but explicitly calling out that open(2) uses euid so this is more friendly for setuid use cases will make it more clear.

Apr 10 '24 14:04 ayushr2

No in the setuid root binary case, the access uses the real uid(which is not 0) to do check, and the open syscall uses the euid(which is 0) to do the check.

I see, could you update the description to reflect this. I know you already mention "Linux uses uid to perform the check", but explicitly calling out that open(2) uses euid so this is more friendly for setuid use cases will make it more clear.

Ok, will make a revision for the commit msg and code.

Apr 10 '24 23:04 terenceli

we run runsc as setuid binary using unprivileged user

FWIW, this doesn't seem very safe. This means an unprivileged user on the system can run runsc as root. In turn, that means such a user can also access anything on the host filesystem (since runsc allows you to create sandboxes that can mount and modify any host file). In turn, that means the user effectively has root on the host. This is functionally the same as if one were to configure sudo to be available to all unprivileged users with no authentication to access the root user. I suggest reconsidering the approach.

Apr 11 '24 20:04 EtiennePerot

we run runsc as setuid binary using unprivileged user

FWIW, this doesn't seem very safe. This means an unprivileged user on the system can run runsc as root. In turn, that means such a user can also access anything on the host filesystem (since runsc allows you to create sandboxes that can mount and modify any host file). In turn, that means the user effectively has root on the host. This is functionally the same as if one were to configure sudo to be available to all unprivileged users with no authentication to access the root user. I suggest reconsidering the approach.

@EtiennePerot Yes you're right, we have also consider this risk. We uses it in very restricted scenario and we also investigated some other methods.

Once the this issue https://github.com/google/gvisor/issues/9918 is resovled totaly we will uses this rootless mode.

Thanks

Apr 14 '24 01:04 terenceli

A friendly reminder that this PR had no activity for 120 days.

Sep 01 '24 00:09 github-actions[bot]

gvisor gvisor copied to clipboard

runsc: boot: use a more reliable method to check /proc

gvisor
gvisor copied to clipboard