solo5 icon indicating copy to clipboard operation
solo5 copied to clipboard

hvt_drop_privileges() behaviour

Open mato opened this issue 6 years ago • 7 comments

#276 adds scaffolding for hvt_drop_privileges() and an OpenBSD pledge implementation.

hvt_drop_privileges() is called by the tender just before entering the VCUP loop. i.e.

  • after all host resources have been acquired (all modules set up)
  • guest ELF has been loaded and memory set up.

We should decide what the behaviour should be on Linux and FreeBSD, to get "secure by default" behaviour. At this time I'm not prepared to add actual seccomp() filtering or Capsicum here, but I think that the behaviour should include at least:

  1. Droppng privileges to an unprivileged user if the tender is running as root.
  2. chroot() to a suitable directory.

@hannesm: Presumably the FreeBSD case is easier, since there will be a suitable non-privileged UID always available and /var/empty could be used for the chroot?

For the Linux case, I'm not sure what the best default behaviour that is guaranteed to work cross-distribution is -- could nobody be used as the unprivileged user? I don't know of any equivalent of /var/empty, but there might be something in the LSB/FHS.

A second case is when this is run out of a container in Linux -- can we guarantee that a nobody will always be available there?

/cc @adamsteen

mato avatar Oct 09 '18 10:10 mato

@mato sounds sensible for FreeBSD /cc @sg2342 who may have an opinion here

hannesm avatar Oct 09 '18 12:10 hannesm

i think for FreeBSD a call to cap_enter(2) would be the way to go.

sg2342 avatar Oct 15 '18 11:10 sg2342

turns out, that cap_enter(2) will not work because because ppoll(2) is not permitted in capability mode. However: the poll(2) code in the FreeBSD kernel is capsicum enabled and shares the relevant parts with the ppoll(2). So the only thing missing in the FreeBSD source tree is an entry for ppoll(2) in sys/kern/capabilities.conf (and make -C sys/kern/ sysent to regen the syscall configuration).

FreeBSD bug 232495

sg2342 avatar Oct 21 '18 01:10 sg2342

I've been thinking about what to do here for the Linux case. There are two cases where the behaviour should be quite different:

  1. In a non-containerized setup: unlike the BSDs there should be no need to run the hvt tender as root. The only thing the tender needs access to is /dev/kvm, which is easily granted through normal permission bits on the device file.
  2. If the tender is running in a container (irrespective of container runtime): It is legitimate to run as root as we can assume it's not "real root", and in a minimal, fully deprivileged container where the tender is the only process running there's not much point in doing anything else (e.g. chroot).

Therefore, I think that for the first case (classic system) we should require running as non-root and make it the user's responsibility to ensure that access to /dev/kvm is available (via being a member of the kvm group -- I believe some distros may even enable this by default for everyone). No chroot()-ing should be done, as there is no sensible default.

In the second case, this requires defining what "running in a container" means and then being able to detect it. I'd prefer to avoid various heuristics (see e.g. here: https://github.com/genuinetools/bpfd/blob/master/proc/proc.go) and instead explicitly support only the case where the tender is running as the sole PID 1, which implies it's inside a PID namespace, which to me seems a "good enough" way to detect if it is running in a "container". So, if and only if:

if (getpid() == 1 && getppid() == 0)

returns true, we allow running as root (probably easiest to test just with getuid() for now rather than CAP_SYS_ADMIN) and do not do anything else, as in this setup there's likely no point in (or meaningful default for) a chroot().

Will try to pass this by some Linux container experts for opinions...

mato avatar Nov 06 '18 12:11 mato

On the FreeBSD side, it looks like we might have to revert this entirely (see #312).

Update on the Linux side -- I'm no longer convinced we should do anything along the lines of classic privilege dropping there. Rather, with the introduction of "spt" (#310) we should look into applying a seccomp sandbox to the hvt tender also.

mato avatar Jan 21 '19 15:01 mato

Ok, so, in the light of:

  1. Discussion in #316, the real fix here is as described in https://github.com/Solo5/solo5/pull/316#issuecomment-464757104 (fixing the FreeBSD vmm APIs).
  2. The build system refactoring in #326 changes the way hvt modules work, and the HVT_DROP_PRIVILEGES=0 case will effectively be non-functional until I re-design that to not use compile-time flags. However, in the mean time, I need the tests to pass!

I'm going to make an executive decision here, which is: The FreeBSD privilege dropping introduced in #286 will be reverted on master shortly.

Also, I'm going to close all issues/PRs related to this except #282, since the multiple discussions are confusing. We can discuss how to proceed forward there.

Note to self: TODO: The existing VM cleanup code for FreeBSD and OpenBSD should be audited and at warn() should be used in the atexit handler(s) if any syscalls fail. /cc @sg2342 @hannesm

mato avatar Mar 27 '19 11:03 mato

#366 implements a capsicum(4) sandbox for Solo5/hvt on FreeBSD 12+. Removing the FreeBSD labels from this, as I consider this done there.

mato avatar Apr 29 '20 10:04 mato