sbd Fix: do not rely on DAC_OVERRIDE capability on Linux despite being root

It turns out, for example, that SELinux will prevent this capability for confined processes running as root. It then means that sbd cannot access, as a client, files used for joining two local communication sides within libqb-arranged IPC mechanism in case those files do not have permissions to explicitly allow file-based access with credentials of this client -- which is exactly what happens when the IPC servers are pacemaker daemons not run as root on their own.

Solution is two-phased:

have sbd add respective non-privileged group corresponding to the server side of the IPC -- this patch
ensure this server side (pacemaker) does allow group-derived access (i.e., the access permissions for group are as relaxed as needed, umask notwithstanding) -- outside of the sbd's scope

Feb 14 '19 21:02 jnpkrn

In the light of ongoing effort to make the pacemaker-API more accessible (moving code to libraries, rearrange libraries, API-server, ...) wouldn't it make sense to have most of that code in a central place where potential users of the API would have it handy (libqb?). Not saying it would necessarily have to happen behind the scenes without code-changes in sbd though.

Feb 15 '19 07:02 wenningerk

CAP_SETGID is required to use initgroups(). Given that this is a trivial bypass of CAP_DAC_OVERRIDE, I think it's reasonable to assume both capabilities could be taken away, and we shouldn't rely on it.

I think a comprehensive solution would be to figure out all the capabilities currently needed by each cluster component, and make sure they are explicitly listed in the relevant SELinux policies. We can set a goal to avoid the need for some of those, but I think that should be more carefully and holistically considered than reactive.

Feb 15 '19 17:02 kgaillot

For the particular issue at hand, I'm thinking the simplest solution is for pacemaker's spec file to add root to the haclient group. I haven't tested it but I think it would work (except for any sbd process started before the pacemaker package was installed, but that shouldn't be any trouble to deal with). It does give access to all root processes rather than selected ones that need it, but that's also the effect of the typical example solution given for similar problems (i.e. make the file owned by root group).

Feb 15 '19 21:02 kgaillot

On 14/02/19 23:37 -0800, wenningerk wrote:

In the light of ongoing effort to make the pacemaker-API more accessible (moving code to libraries, rearrange libraries, API-server, ...) wouldn't it make sense to have most of that code in a central place where potential users of the API would have it handy (libqb?).

I agree with the reusability sentiment, however there are some catches:

being dependent on particularly new version of the dependencies, which is bound to cause pains until some years later (OK if you have the right version of the dependency, or sorry, out of luck, you cannot solve your immediate problem easily otherwise)
universality implies heavy adapting/boilerplaiting to the specific conventions of the client side (mapping of the exit codes, etc.) -- here, the code saved by having a dedicated universal implementation is of about the same size as said adaptor layer would have
sbd, IIUIC, tries to only use a minimal set of pacemaker API, and there can still be use-cases for sbd to be used outside of pacemaker?

hence I don't see any particular conflict here.

Not saying it would necessarily have to happen behind the scenes without code-changes in sbd though.

Libqb is now passed a very minimal set of intention predestinations. This could be systemically improved upon with making libqb externally configurable (/etc/dbus-1/system.d/ drop-in configuration setting a precedence, for instance), alleviating any new "hardwired" (recompilation requiring, hence undesired) burden.

-- Jan (Poki)

Feb 18 '19 14:02 jnpkrn

On 15/02/19 09:30 -0800, Ken Gaillot wrote:

CAP_SETGID is required to use initgroups(). Given that this is a trivial bypass of CAP_DAC_OVERRIDE, I think it's reasonable to assume both capabilities could be taken away, and we shouldn't rely on it.

By that logic, shall we assume that root is totally helpless?

I hope not, and unlike with initiative to drop dac_override that goes for years already, I am not aware of anything similar regarding setgid, which is enabled for 6+ years: https://github.com/fedora-selinux/selinux-policy-contrib/blob/55adef764d74757501c61d33898c526e44c245b3/rhcs.te#L149 and would not be very wise either, since it would possibly affect every and each daemon process run directly with pacemaker-execd.

I think a comprehensive solution would be to figure out all the capabilities currently needed by each cluster component,

including unpredictable scope of services to be run from within pacemaker?

and make sure they are explicitly listed in the relevant SELinux policies. We can set a goal to avoid the need for some of those, but I think that should be more carefully and holistically considered than reactive.

Unless there's some way for one-touch solution for users to add ad-hoc policies as detected when running their services, this would work only in theory. Even if that solution existed, who is then to validate that all the policies were distributed everywhere that they are required (after failovers, etc.).

Sounds rather scary and anti-HA.

-- Jan (Poki)

Feb 18 '19 14:02 jnpkrn

On 15/02/19 09:30 -0800, Ken Gaillot wrote: CAP_SETGID is required to use initgroups(). Given that this is a trivial bypass of CAP_DAC_OVERRIDE, I think it's reasonable to assume both capabilities could be taken away, and we shouldn't rely on it. By that logic, shall we assume that root is totally helpless? I hope not, and unlike with initiative to drop dac_override that goes for years already, I am not aware of anything similar regarding setgid, which is enabled for 6+ years: https://github.com/fedora-selinux/selinux-policy-contrib/blob/55adef764d74757501c61d33898c526e44c245b3/rhcs.te#L149 and would not be very wise either, since it would possibly affect every and each daemon process run directly with pacemaker-execd.

No, we should explicitly list the capabilities we need in the cluster SELinux policy, both because it is correct and because we can't predict what the default set will be in the future.

I think a comprehensive solution would be to figure out all the capabilities currently needed by each cluster component, including unpredictable scope of services to be run from within pacemaker?

The cluster SELinux policy (which is not limited to sbd) should include all capabilities needed by all cluster components throughout the stack. These are no more or less predictable than everything that's already in the policy.

and make sure they are explicitly listed in the relevant SELinux policies. We can set a goal to avoid the need for some of those, but I think that should be more carefully and holistically considered than reactive. Unless there's some way for one-touch solution for users to add ad-hoc policies as detected when running their services, this would work only in theory. Even if that solution existed, who is then to validate that all the policies were distributed everywhere that they are required (after failovers, etc.). Sounds rather scary and anti-HA.

I don't follow what you're trying to say here. I'm suggesting that the SELinux policy for cluster components should explicitly grant the capabilities required by those components, which I believe is the purpose of having the policy. Any user running a cluster with SELinux enabled already has to verify that their SELinux policies on every node allow their services to run, that doesn't change.

Feb 18 '19 17:02 kgaillot

Klaus mentioned that the original problem description says that some of the relevant IPC files have mode 0600, which makes both group-based approaches (adding root to the haclient group, or the process taking on haclient privileges) insufficient. It looks like adding the capability to the SELinux policy will be the only solution (technically, one of the group approaches could be combined with modifying the IPC permissions, but I suspect that will be impractical).

Feb 18 '19 17:02 kgaillot

Btw. are there any systematic approaches how to expose capabilities required in a repo - preferably easy to parse to automatized handling?

Feb 18 '19 17:02 wenningerk

Btw. are there any systematic approaches how to expose capabilities required in a repo - preferably easy to parse to automatized handling?

It should be reasonably simple to get a baseline based on what system calls the code uses. Dreaming a little bit, that could be applied to common libraries to generate a knowledge database for their calls as well. A quick search doesn't show anything existing, but these are of interest anyway:

https://github.com/iovisor/bcc/blob/master/tools/capable.py -- logs what capabilities are actually used by running processes
pscap from libcap-ng-utils package -- shows the effective capabilities that running processes have (i.e. all, unless it dropped or was denied some)

Feb 18 '19 17:02 kgaillot

It should be reasonably simple to get a baseline based on what system calls the code uses. Dreaming a little bit, that could be applied to common libraries to generate a knowledge database for their calls as well. A quick search doesn't show anything existing, but these are of interest anyway:

Hmm - guess tooling - especially static analysis - would have to be quite smart to spit out a list that would be sufficiently restrictive and comprehensible. But giving hints which capabilities in a manually administered list aren't used respectively which capabilities might be required in addition seems reasonable. Maybe with the possibility to add suppressions ...

Feb 18 '19 17:02 wenningerk

Hmm - guess tooling - especially static analysis - would have to be quite smart to spit out a list that would be sufficiently restrictive and comprehensible. But giving hints which capabilities in a manually administered list aren't used respectively which capabilities might be required in addition seems reasonable. Maybe with the possibility to add suppressions ...

Right, that's why I was thinking just a baseline to start from (e.g. anything that uses setgid(), initgroups() or setgroups() requires CAP_SETGID). Maybe one could list those as "definite" and others as "possibly required under certain circumstances", which CAP_DAC_OVERRIDE would probably always qualify for.

Feb 18 '19 19:02 kgaillot

On 18/02/19 17:04 +0000, Ken Gaillot wrote:

On 18/02/19 15:27 +0100, Jan Pokorný wrote:

On 15/02/19 09:30 -0800, Ken Gaillot wrote:

CAP_SETGID is required to use initgroups(). Given that this is a trivial bypass of CAP_DAC_OVERRIDE, I think it's reasonable to assume both capabilities could be taken away, and we shouldn't rely on it.

By that logic, shall we assume that root is totally helpless?

I hope not, and unlike with initiative to drop dac_override that goes for years already, I am not aware of anything similar regarding setgid, which is enabled for 6+ years: https://github.com/fedora-selinux/selinux-policy-contrib/blob/55adef764d74757501c61d33898c526e44c245b3/rhcs.te#L149 and would not be very wise either, since it would possibly affect every and each daemon process run directly with pacemaker-execd.

No, we should explicitly list the capabilities we need in the cluster SELinux policy, both because it is correct and because we can't predict what the default set will be in the future.

Which cluster SELinux policy do you have in mind?

The one I linked above? Do you want to re-review it? Or write a new one from scratch?

I think a comprehensive solution would be to figure out all the capabilities currently needed by each cluster component,

including unpredictable scope of services to be run from within pacemaker?

The cluster SELinux policy (which is not limited to sbd)

sbd currently has a dettached policy in the referenced project (sbd_t vs. cluster_t), which likely means you want to write a new one from scratch

should include all capabilities needed by all cluster components throughout the stack. These are no more or less predictable than everything that's already in the policy.

Logical error likely stemming from missing the "since it would possibly affect every and each daemon process run directly with pacemaker-execd" part above. That needs to be absorbed first.

Then you'll see that there's indeed a predictable and non-predictable part of that, and that some (unpredictable!) daemons shall rather make do without DAC_OVERRIDE capability when run as a cluster resource already (otherwise cluster_t in Fedora's upstream SELinux policy will need to get that back as well, meaning weakening the containment).

Cluster proper components alone are, on the contrary, indeed predictable, but we can just as well assume that the currently assigned set in the said upstream policy "just works unless disproved".

and make sure they are explicitly listed in the relevant SELinux policies. We can set a goal to avoid the need for some of those, but I think that should be more carefully and holistically considered than reactive.

Unless there's some way for one-touch solution for users to add ad-hoc policies as detected when running their services, this would work only in theory. Even if that solution existed, who is then to validate that all the policies were distributed everywhere that they are required (after failovers, etc.).

Sounds rather scary and anti-HA.

I don't follow what you're trying to say here.

likely because the already restated point didn't make it across

I'm suggesting that the SELinux policy for cluster components should explicitly grant the capabilities required by those components, which I believe is the purpose of having the policy.

True, but the applicability of the policy is highly limited with unability to enumerate all the cluster stack use cases, as opposed to self-hosted boundaries like httpd_t -- observe: adding new httpd modules is rather an exception (easy to follow the suite and give it a proper label), whereas involving custom daemons is one of typical use cases for the cluster stack (good luck with tutoring these users to come up with brand new policy modules and respective transitions from cluster_t just because you want as tight SELinux boundaries left for the inner cluster stack as possible).

Any user running a cluster with SELinux enabled already has to verify that their SELinux policies on every node allow their services to run, that doesn't change.

Again, this is the response to something unthought of. From a brief test, cluster_t is derived with whatever spawned process, regardless of whether or how the respective binary is labelled. With your envisioned pedantic use of the security boundary, it could start to be a real problem, then. At least, systemd resources are isolated from this :-)

Partial solutions would be:

make pacemaker-execd able to mangle with SELinux context on its own so as to somehow neutralize the narrow scope you envision (note that cluster_t is shared also with corosync, amongst others; also you would need to have the power to do that, which sort of defeats the separation), possibly doable with transition rules directly in SELinux policy

... or even better ...

drop capabilities programmatically directly in the daemons themselves (between fork and execve in pacemakerd?, except for pacemaker-execd, indeed) and seal that possibly also with NO_NEW_PRIVS: https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html

The latter seems most reasonable to me -- and it's both a documentation and the effectively actionable mark in the code.

Sadly, due to the multi-daemon arrangement, we cannot use systemd as the catch-all solution for that currently (which would also conveniently open BPF-based syscall filtering to us if you want to micromanage the security to the tiniest bit).

But without major redesigning relevant SELinux policies or writing one or more replacement modules from scratch, your original plan won't fly, I am afraid.

-- Poki

Feb 18 '19 22:02 jnpkrn

Btw. thanks for pscap tip, but it doesn't look that SELinux and the respective logic (like pretend as if dac_override doesn't exist unless permitted with SELinux explictly for a root-level process) gets reflected there. And the other tool needs significant run-time coverage for it to be useful at all. And somewhat similar to that is latrace tool, although it didn't work well for me on Fedora last time I tried: https://bugzilla.redhat.com/show_bug.cgi?id=1628206

Feb 18 '19 22:02 jnpkrn

Logical error likely stemming from missing the "since it would possibly affect every and each daemon process run directly with pacemaker-execd" part above. That needs to be absorbed first.

I'm still not following you. Subsidiary processes are run as their own type when relevant, e.g.:

# pcs resource show web
 Resource: web (class=ocf provider=heartbeat type=apache)
  Operations: monitor interval=10s timeout=20s (web-monitor-interval-10s)
              start interval=0s timeout=40s (web-start-interval-0s)
              stop interval=0s timeout=60s (web-stop-interval-0s)
# ps axZ | grep 'http[d]'
system_u:system_r:httpd_t:s0      9100 ?        Ss     0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
system_u:system_r:httpd_t:s0      9101 ?        S      0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
system_u:system_r:httpd_t:s0      9102 ?        S      0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
system_u:system_r:httpd_t:s0      9103 ?        S      0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
system_u:system_r:httpd_t:s0      9104 ?        S      0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid
system_u:system_r:httpd_t:s0      9105 ?        S      0:00 /sbin/httpd -DSTATUS -f /etc/httpd/conf/httpd.conf -c PidFile /var/run/httpd.pid

Perhaps you mean custom daemons that don't have any SELinux labelling? But that's no different than running such a daemon without a cluster on a SELinux-enabled host. The user is responsible for that, with or without a cluster.

Am I misunderstanding your point?

Feb 19 '19 16:02 kgaillot

Perhaps you mean custom daemons that don't have any SELinux labelling? But that's no different than running such a daemon without a cluster on a SELinux-enabled host. The user is responsible for that, with or without a cluster.

Yes, "every and each daemon process run directly with pacemaker-execd" could be totally custom (we have no interest to limit what the cluster stack is used for, quite the contrary as long as declared axioms of use hold) and lacking their own policies, and the very type that pacemaker daemons run as (cluster_t at least in Fedora/RHEL derived systems, which we talk about all the time now, anyway) is derived for these.

And, per https://bugzilla.redhat.com/show_bug.cgi?id=1677325#c9, relaxed unconfined_domain_type attribute correlates on the allowed operations -- likely to make it all work in said circumstances. That's why I am trying to show hardening on this SELinux front for the whole cluster stack together with proactively retrofitting the tightest boundaries possible is hence rather pointless unless you want a complete rewrite of the policy. Contrary to the other, orthogonal, very targeted (SELinux policy's cluster_t is rather catch-all low effort) measures I proposed.

Now back to free-standing sbd_t type, I owe an apology here that I missed that this one indeed lacks setgid and that would indeed need to be added for this patch (only first part of the overall solution) to work even with the lack of dac_override capability.

But I don't agree setgid is as strong as dac_override except for the minority cases -- not even here unless the second part of the solution is applied. Since group permissions for file can still prevent any sort of an access, and lacking fowner (as is the case with sbd), such guarded root-privileged process cannot change the permissions/ownership of the file. So, allowing setgid in exchange for dac_override would be a welcome step towards more security, which is what we surely want.

Feb 20 '19 15:02 jnpkrn

Yes, "every and each daemon process run directly with pacemaker-execd" could be totally custom

Users are responsible for all aspects of custom services, e.g. providing an LSB or OCF script, and configuring firewall rules on all nodes -- SELinux policies for custom services fall into this same category.

So, allowing setgid in exchange for dac_override would be a welcome step towards more security, which is what we surely want.

Except that some IPC sockets do not have any group permissions, so changing groups via any means won't matter. We'd also have to find every place that those permissions are set, verify that group access is not a security risk, and modify them to add group read/write. I think dac_override is a simpler and more appropriate solution.

Feb 20 '19 15:02 kgaillot

Except that some IPC sockets do not have any group permissions.

Looks like that's a consequence of a shoddy usage of umask in pacemaker where opening files with particular permissions would do instead :-/

Feb 20 '19 15:02 jnpkrn

Should we still consider any changes here or can we close this being resolved for fedora via selinux-policy. Don't know for other distributions but we had no complaints so far ...

Jan 30 '20 09:01 wenningerk

Can one of the admins verify this patch?

Jan 30 '20 09:01 knet-ci-bot

sbd sbd copied to clipboard

Fix: do not rely on DAC_OVERRIDE capability on Linux despite being root

sbd
sbd copied to clipboard