gvisor
gvisor copied to clipboard
runsc: boot: use a more reliable method to check /proc
Currently we check whether /proc is umounted by 'unix.Access'. If we run runsc as setuid binary using unprivileged user(in our use cases) we may get EPERM even the /proc is umounted as Linux uses uid to perform the check. In this case we get the error '/proc is still accessible'.
Notice this error can only occur combined with following two conditions:
- we uses --network host. This is because in non-host mode, the runsc will create a new userns and do uid identity mapping, the child process will be run as 0:0(uid:gid) so the check will be ok
- the umask setting should be set to mask the all perm of others such as 0027. This is because if we don't mask the others's read perm, the check will also be ok as the unprivileged uid can also read /proc/ file.
This patch uses a more reliable method to do check /proc check.
@ayushr2 Could you please review this? I see you add this code. Thanks
I am a bit confused, if access(2) fails with EPERM due to lack of read permission for /proc
directory, then won't ioutil.ReadDir() (which should use getdents(2) on an open FD) also fail with EPERM when opening a O_RDONLY FD to /proc
?
I am a bit confused, if access(2) fails with EPERM due to lack of read permission for
/proc
directory, then won't ioutil.ReadDir() (which should use getdents(2) on an open FD) also fail with EPERM when opening a O_RDONLY FD to/proc
?
No in the setuid root binary case, the access uses the real uid(which is not 0) to do check, and the open syscall uses the euid(which is 0) to do the check.
No in the setuid root binary case, the access uses the real uid(which is not 0) to do check, and the open syscall uses the euid(which is 0) to do the check.
I see, could you update the description to reflect this. I know you already mention "Linux uses uid to perform the check", but explicitly calling out that open(2) uses euid so this is more friendly for setuid use cases will make it more clear.
No in the setuid root binary case, the access uses the real uid(which is not 0) to do check, and the open syscall uses the euid(which is 0) to do the check.
I see, could you update the description to reflect this. I know you already mention "Linux uses uid to perform the check", but explicitly calling out that open(2) uses euid so this is more friendly for setuid use cases will make it more clear.
Ok, will make a revision for the commit msg and code.
we run runsc as setuid binary using unprivileged user
FWIW, this doesn't seem very safe. This means an unprivileged user on the system can run runsc
as root. In turn, that means such a user can also access anything on the host filesystem (since runsc
allows you to create sandboxes that can mount and modify any host file). In turn, that means the user effectively has root on the host. This is functionally the same as if one were to configure sudo
to be available to all unprivileged users with no authentication to access the root
user. I suggest reconsidering the approach.
we run runsc as setuid binary using unprivileged user
FWIW, this doesn't seem very safe. This means an unprivileged user on the system can run
runsc
as root. In turn, that means such a user can also access anything on the host filesystem (sincerunsc
allows you to create sandboxes that can mount and modify any host file). In turn, that means the user effectively has root on the host. This is functionally the same as if one were to configuresudo
to be available to all unprivileged users with no authentication to access theroot
user. I suggest reconsidering the approach.
@EtiennePerot Yes you're right, we have also consider this risk. We uses it in very restricted scenario and we also investigated some other methods.
Once the this issue https://github.com/google/gvisor/issues/9918 is resovled totaly we will uses this rootless mode.
Thanks
A friendly reminder that this PR had no activity for 120 days.