[RFC] Follow idea of immutable /usr vs. mutable overrides in /etc
There are many practical reasons why we want to copy this growingly popular scheme, while enabling users to modify the agents per their needs, for instance:
-
having solely static data in
/usrallows one to share that as read-only (or sparsely utilized copy-on-write) mount point with their VMs and containers so as to save space -
no conflict-on-update issue
Hence my expectation is that OCF standard will address this,
presumably in resource-agent-api.md by replacing
The Resource Agents are located in subdirectories under
/usr/ocf/resource.d.
with something like
The Resource Agents are located in subdirectories under
/usr/ocf/resource.d. OCF X.Y compliant RM shall first consult/etc/ocf/resource.dpath for existence of the requested agent, which, when present, takes a precedence in the agent lookup. This makes for convenient customization of existing agents without altering them at the stated standard location, and in turn, simplifying a revert to stock configuration, coexistence with package updates, and possibly locked-down use of/usrmount point. The agent lookup based on the file presence is definite, any further issue, like file not being executable, notwithstanding.
Sounds good to me. :+1:
I'm uncomfortable with putting executables in /etc, and I strongly think users shouldn't reuse the same provider+agent name when modifying an agent, as it greatly complicates troubleshooting.
The currently recommended approach for modifying resource agents is to create a new, custom provider under /usr/lib/ocf/resource.d. I could see extending the standard to allow providers in an alternate location, such as /usr/local, /opt, or /srv (followed by ocf/resource.d), or even allowing an OCF_RA_PATH environment variable. I'm not convinced it's a good idea though, as custom OCF scripts are not any more mutable than the commonly distributed ones. In production, few users are going to modify custom scripts directly; they are going to have a development environment, and then push changes to all production nodes (comparable to updating the resource-agents package).
On 21/11/17 22:00 +0000, Ken Gaillot wrote:
I'm uncomfortable with putting executables in /etc,
There's a bunch of executable glue scripts already (including /etc/rc.d/init.d ones for non-systemd systems), which is exactly what resource agents are meant to be. I see no conflict here.
and I strongly think users shouldn't reuse the same provider+agent name when modifying an agent, as it greatly complicates troubleshooting.
Resource managers would need to identify the particular file clearly, true. There perhaps, if the idea is deemed good enough, should also be a provision in the specification wrt. explicit remapping of agent specification to particular path, like SIGHUP signal sent to the resource manager. Prior to that, it would keep initially figured path.
The currently recommended approach for modifying resource agents is to create a new, custom provider under /usr/lib/ocf/resource.d. I could see extending the standard to allow providers in an alternate location, such as /usr/local, /opt, or /srv (followed by ocf/resource.d), or even allowing an OCF_RA_PATH environment variable. I'm not convinced it's a good idea though, as custom OCF scripts are not any more mutable than the commonly distributed ones. In production, few users are going to modify custom scripts directly; they are going to have a development environment, and then push changes to all production nodes (comparable to updating the resource-agents package).
That's not the use case I had in mind, going that deep as to also change the actual configuration.
Rather something like: http://oss.clusterlabs.org/pipermail/users/2017-August/006303.html
Anyway, having bulk synchronization of /etc across the nodes can be appealing (also for systemd unit files, that can likewise be employed with a resource manager if there's a support).
-- Jan (Poki)
I'd rather not. Doesn't the provider concept offer enough flexibility? As Ken said, it would also be quite difficult to figure out which RA is being run if the resource manager is allowed to look at more than one place for the same resource configuration.
And you can already make custom or similarly named directories in /usr/lib/ocf/resource.d/heartbeat to avoid clashing with the agents provided by the distro.
On 22/11/17 14:27 +0000, Dejan Muhamedagic wrote:
I'd rather not. Doesn't the provider concept offer enough flexibility? As Ken said, it would also be quite difficult to figure out which RA is being run if the resource manager is allowed to look at more than one place for the same resource configuration.
Additional idea sketched above would make the flip only at defined moments (initial start, being told to rescan the agent mapping). Not at arbitrary points, which would indeed make the situation hard to follow.
On 22/11/17 14:36 +0000, Oyvind Albrigtsen wrote:
And you can already make custom or similarly named directories in /usr/lib/ocf/resource.d/heartbeat to avoid clashing with the agents provided by the distro.
Naturally, but I thought it's clear that I anticipated it for a bit different use cases by now.
-- Jan (Poki)
On 22/11/17 17:11 +0100, Jan Pokorný wrote:
Additional idea sketched above would make the flip only at defined moments (initial start, being told to rescan the agent mapping). Not at arbitrary points, which would indeed make the situation hard to follow.
On the other hand, let's not fall into the fallacy that current situation is a breeze in the "which agent variant was run, exactly" matter, at least with pacemaker in particular:
- respective agent files are not locked for the pacemaker's lifespan, so can be edited anytime
- ditto agents are not copied to a private temporary location first (or even copied into the memory to be executed from, which would be doable for hashbang/non-binary executables)
- checksums of the agents are not logged/remembered+rechecked (or ditto on timestamp comparison basis)
Which already makes it rather difficult to tell which variant of the agent was run in any particular moment in the past unless you can testify nothing has intervened (and even then it's not 100%). So I don't see any remarkable regression, pros and cons summed together IMHO yields a positive result here when the mentioned additional idea of explicit rescans is mixed in.
-- Jan (Poki)
The main benefit as I see it would be enabling the sysadmin to add their own agents on top of a read-only /usr file system delivered by a transactional update mechanism.
On 23/11/17 09:01 +0000, Kristoffer Grönlund wrote:
The main benefit as I see it would be enabling the sysadmin to add their own agents on top of a read-only
/usrfile system delivered by a transactional update mechanism.
The other practical value is that administrator would (one wants to say, finally) gain power to defuse OCF-based resources that are not, by any mean, desired in the projected cluster from the set of agents that get installed unconditionally through the common distribution channels, sometimes including ocf:heartbeat:anything, which may be unsettling on its own: http://lists.clusterlabs.org/pipermail/users/2016-January/002178.html This is very similar to and directly inspired with systemd's masking approach.
So when the cluster should only ever serve for minimalistic httpd + virtual IP combo, the solution would be to run this upon each install/update of resource-agents in a RPM-based distro:
# mkdir -p /etc/lib/ocf/resource.d/heartbeat
# rpm -ql resource-agents \
| grep '/usr/lib/ocf/resource.d/heartbeat/[^.].*' \
| grep -vE 'apache|IPaddr2' \
| sed "s|/usr|/etc|" | xargs -I{} echo ln -s /dev/null {}
For this to work harmonically, resource managers should further realize zero size of the discovered agents like this and exclude them from "try running" attempts (incl. at the system location, indeed).
For pacemaker in particular and putting fence-agents aside (preferrably, there would be a convergence towards OCF in some aspects, plus the agents are separated in discrete subpackages in el7, giving administrator at least some say to what's available), the only way to run an unrestricted command from cluster configuration would then be "lsb:
-- Jan (Poki)
Thinking about that, /etc/ocf should indeed be subdir-namespaced per
resource-manager, possibly reserving a chosen name (ANY?) to apply for all.
The other practical value is that administrator would (one wants to say, finally) gain power to defuse OCF-based resources
I'm sorry, but I don't understand this argument at all. Why is the administrator trying to prevent the administrator from configuring resources?
I also don't recall any actual argument for why the anything agent is problematic...
On 27/11/17 11:23 +0000, Kristoffer Grönlund wrote:
The other practical value is that administrator would (one wants to say, finally) gain power to defuse OCF-based resources
I'm sorry, but I don't understand this argument at all. Why is the administrator trying to prevent the administrator from configuring resources?
Why to administratively prevent selected (this needs stressing out) OCF resources (out of all delivered en masse with resource-agents project, for instance)?
Mostly to allow (on opt-in basis) one to follow least-privilege principle with the intention to keep any kind of intrusion as limited as possible, especially for static cluster deployments where additional resource agents are irrelevant at any rate.
Needless to mention one such intrusion enabler from the recent time: http://oss.clusterlabs.org/pipermail/users/2016-November/004432.html
Sure, one might emulate something like that using acls:
<acls>
<acl_target id="bob">
<role id="admin"/>
</acl_target>
<acl_role id="admin">
<acl_permission id="admin-deny-1" kind="deny" xpath="//primitive[@class='ocf'
and
@provider='heartbeat'
and
@type!='apache'
and
@type!='IPaddr2']"/>
<acl_permission id="admin-write-1" kind="write" xpath="/cib/resources"/>
<acl_permission id="admin-read-1" kind="read" xpath="/cib"/>
</acl_role>
</acls>
But beside being rather clumsy, it doesn't cover the fully privileged users (e.g. hacluster user) -- it cannot by design.
Does it answer your question?
I also don't recall any actual argument for why the
anythingagent is problematic...
This one together with Stateful are perfectly fine for code-less experimenting with how pacemaker works and kicking off custom agents, but quite an antipattern for the production use where you want launcher as tightly fitting as possible, getting the monitoring right, covering the corner cases, etc. O:-)
Plus add the above security aspect into the equation. It's not a security measure per se, but onion-like approach to security hardening (just as SELinux is, for instance) makes sense when it doesn't impose new pains.
Voluntary constraining of the agents' repertoire is IMHO one of the easy wins on this front.
-- Jan (Poki)
Yeah, I think I follow what you're saying. Of course the apache agent might not be the best example to allow when trying to avoid privilege escalation, since it can be trivially configured to execute arbitrary executables. Though that might be an argument for fixing apache. ;)
There's a bunch of executable glue scripts already (including /etc/rc.d/init.d ones for non-systemd systems), which is exactly what resource agents are meant to be. I see no conflict here.
While there are common existing cases of executables under /etc, they are exceptions, not the rule. System administrators expect /etc to contain configuration, and executables to be located elsewhere, except in unusual cases.
I believe this is recommended in the LSB, with good reason. An example is that resource agents do not necessarily need to be scripts, they can be compiled, but /etc is architecture-independent.
On the other hand, let's not fall into the fallacy that current situation is a breeze in the "which agent variant was run, exactly" matter, at least with pacemaker in particular:
The main goal is whether enterprise support personnel can reasonably determine whether a particular agent is supported, not the exact agent code used. If the user can override an OS-provided agent, extra steps must be taken with every support case to check whether that has happened. The current recommendation of using a different provider name makes it immediately clear.
Also, the provider name is intended to indicate who provided the agent. If a custom script reuses a provider name, it obscures that indicator. The current recommendation of using a different provider name when modifying a script makes it clear where the agent came from.
The main benefit as I see it would be enabling the sysadmin to add their own agents on top of a read-only /usr file system delivered by a transactional update mechanism.
I don't believe this accomplishes that. When users modify or create resource agents, they typically get them working, then rarely or never touch them again. They tend to change less frequently than OS-supplied resource agents. Custom agents don't prevent /usr from being read-only any more than OS-supplied ones do. In either case, there has to be a mechanism to temporarily make /usr writeable during updates.
Even if a non-/usr location is perceived to be desirable, I would argue for using a custom provider name, and have the non-/usr location be where to look for additional providers.
I'm sorry, but I don't understand this argument at all. Why is the administrator trying to prevent the administrator from configuring resources?
I also don't recall any actual argument for why the anything agent is problematic...
I agree. Disabling particular resource agents is no different than disabling particular binaries provided with any other package. If someone wishes to disable unused resource agents, likely they want to disable unused binaries from other packages as well, and already have a generic mechanism for doing so.
Also, this is a security risk, not a mitigation. Being able to write a script into /etc that is automatically run as root without having to touch the pacemaker configuration destroys any security gained by mounting /usr read-only. And I can't imagine any scenario where a security compromise that allows an unused OCF agent as a vector doesn't have an easier vector elsewhere. Pacemaker runs as root and can run arbitrary executables. They don't have to be in the OCF agent directory.
Regarding non-production agents such as Dummy, anything, etc., it is up to each distribution to decide which agents are installed by which packages. For example, RHEL already removes some agents distributed upstream. Any distribution could move such agents to a resource-agents-testing package, for example, or create a separate package for each resource agent, allowing users to install only the ones they need. Similarly users who compile their own can build packages as they like.
Bottom line, I could see some value in having alternate locations for providers, but I think users should be shepherded into using a unique provider name if they modify or create an agent.
On 27/11/17 18:18 +0000, Kristoffer Grönlund wrote:
Yeah, I think I follow what you're saying. Of course the
apacheagent might not be the best example to allow when trying to avoid privilege escalation, since it can be trivially configured to execute arbitrary executables. Though that might be an argument for fixingapache. ;)
To be honest, I didn't even start considering these trivial bypasses, I vaguely remember I observed a nasty injection (https://github.com/ClusterLabs/resource-agents/pull/878#issuecomment-264518726 could be related), which is an inherent risk with execute-based-on-parameter unless there is a targeted scrutiny.
Back to your point, when daemon executable is deemed absolutely necessary parameter, there can always be a (preferably infloop-free) check that all elements of the traversal path down to the binary are owned by root-, and at least the binary is non-writable by others. That would be a good start.
-- Jan (Poki)
On 27/11/17 18:52 +0000, Ken Gaillot wrote:
There's a bunch of executable glue scripts already (including /etc/rc.d/init.d ones for non-systemd systems), which is exactly what resource agents are meant to be. I see no conflict here.
While there are common existing cases of executables under /etc, they are exceptions, not the rule.
This is then a subjectively inferred rule, not a given fact. And I am not cheered when that's used as a base to naysay what I believe is a good, versatile mechanism.
System administrators expect /etc to contain configuration, and executables to be located elsewhere, except in unusual cases.
Ditto, plus resource agents are mostly configuration-dealing glue to semi-supervise actual heavylifters. And initscripts were not different in this aspect, while also present in /etc.
I believe this is recommended in the LSB, with good reason.
Ditto, plus a brief look at https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/etc.htmln
/etc/cron.daily A directory containing shell scripts to be executed once a day;
An example is that resource agents do not necessarily need to be scripts, they can be compiled, but /etc is architecture-independent.
That's up to careful consideration, symlinks from /etc are always an option.
On the other hand, let's not fall into the fallacy that current situation is a breeze in the "which agent variant was run, exactly" matter, at least with pacemaker in particular:
The main goal is whether enterprise support personnel can reasonably determine whether a particular agent is supported, not the exact agent code used. If the user can override an OS-provided agent, extra steps must be taken with every support case to check whether that has happened. The current recommendation of using a different provider name makes it immediately clear.
You are talking about the happy cases (users caring about what was written on the topic, etc.) while I am about the pessimistic scenarios. And for these, there's next to no difference, except that with proper tooling support, it may be immediately clear that /etc override is what's in use (cf. hidden in-situ changes in the agents). And a different provider can be used also with /etc location.
Also, the provider name is intended to indicate who provided the agent. If a custom script reuses a provider name, it obscures that indicator. The current recommendation of using a different provider name when modifying a script makes it clear where the agent came from.
See above.
The main benefit as I see it would be enabling the sysadmin to add their own agents on top of a read-only /usr file system delivered by a transactional update mechanism.
I don't believe this accomplishes that. When users modify or create resource agents, they typically get them working, then rarely or never touch them again. They tend to change less frequently than OS-supplied resource agents. Custom agents don't prevent /usr from being read-only any more than OS-supplied ones do.
What I had in mind, say, a bunch of VMs could share the same /usr so as to save space. Or systemd stateless approach could be used (it's on topic for HA quite a lot, actually, after fencing, reboot to the perfectly known state of the system, as an extension of rebooting the machine to -- perhaps not so perfectly known, e.g., that one corrupted file blocking further progress may still be present -- known state).
In these scenarios, one could treat /usr as distro-updates driven, verifiable, reproducible part of the machine data, and contrary, /etc/ being the dynamic, localized, customized factor, sharable only selectively (csync2 seems like really a neat tool for that) if at all. In my view, there's really no other intuitive alternative to an overridding location than under /etc.
And even in regular uses, you don't want to bet resource-agents or other packages will happen to acquire the same provider label you chose for your custom agents, do you? If you placed the provider in /etc, you'd have a benefits of: = no potential loss of the code for the agents (packages are really not expected to ship anything under /etc/ocf) -> good
- your agents will have a priority even if the stated provider clash happens -> good
In either case, there has to be a mechanism to temporarily make /usr writeable during updates.
That's out of scope here.
Even if a non-/usr location is perceived to be desirable, I would argue for using a custom provider name, and have the non-/usr location be where to look for additional providers.
Sure the custom providers would be applicable in /etc as well.
I'm sorry, but I don't understand this argument at all. Why is the administrator trying to prevent the administrator from configuring resources?
I also don't recall any actual argument for why the anything agent is problematic...
I agree. Disabling particular resource agents is no different than disabling particular binaries provided with any other package.
Except that with heartbeat's anything and the like, you can run arbitrary executables with arbitrary arguments, i.e., whatever you want? Whereas the surface is substantially limited with proper ones, especially when hardenings like suggested in the previous comment get applied.
If someone wishes to disable unused resource agents, likely they want to disable unused binaries from other packages as well, and already have a generic mechanism for doing so.
There must really be a misunderstanding, this part of the proposal is about agents that pose unnecessary attack surface for cases the (possily cluster-wide) execution of the agents becomes possible as a matter of a breach or compromise of some kind (like the referred CVE). It has nothing to do with unused binaries across the systems (which furthermore regular users cannot run as root unless there's some other exploit or security weakness), even if some agents allow that (could be tightened further...).
This might even go a full circle:
- distribution-provided resource-agents only support distro-delivered binaries where applicable (apache -> system's version of httpd), with no possibility to override the executable through cluster configuration, similarly, custom built resource-agents would either configure-time figure out the correct paths or provide respective toggles to preset the paths to executables accordingly
- if a different httpd is required, one is free to either override the same agent using /etc location and/or use a custom provider, but then is clearly on her own
And when agents are defaults-injection ready (like when using ": ${OCR_RESKEY_foo_default=bar}", the customization could easily be as short as three lines: shebang, export of customized OCR_RESKEY_foo_default, source of the original).
Main benefit is that by default, you'll get tailored setup without possibility to override and run the binary on cluster configuration level in said accident or similar scenarios.
Also, this is a security risk, not a mitigation. Being able to write a script into /etc that is automatically run as root without having to touch the pacemaker configuration destroys any security gained by mounting /usr read-only.
I don't follow. How is a normal user privileged to write to /etc?
And I can't imagine any scenario where a security compromise that allows an unused OCF agent as a vector doesn't have an easier vector elsewhere. Pacemaker runs as root and can run arbitrary executables. They don't have to be in the OCF agent directory.
Yes, in pacemaker, we should be looking at least at restricting lsb agents not to allow trivial "parent directory" escapes, which is what I actually used to translate rgmanager's "script" resources to CIB equivalent in "clufter ccs2pcs*" :-)
(information about what needs to be symliked where could be enough on clufter side)
Regarding non-production agents such as Dummy, anything, etc., it is up to each distribution to decide which agents are installed by which packages. For example, RHEL already removes some agents distributed upstream. Any distribution could move such agents to a resource-agents-testing package, for example, or create a separate package for each resource agent, allowing users to install only the ones they need.
Wow, case of telepathy, just discussed this idea today with pcs folks :)
Similarly users who compile their own can build packages as they like.
Bottom line, I could see some value in having alternate locations for providers, but I think users should be shepherded into using a unique provider name if they modify or create an agent.
Unless they want to defuse particular agents or keep unified CIB in cases prompting the "adapting" overrides (perhaps along with /usr being downright locked-down).
-- Jan (Poki)
While there are common existing cases of executables under /etc, they are exceptions, not the rule. This is then a subjectively inferred rule, not a given fact. And I am not cheered when that's used as a base to naysay what I believe is a good, versatile mechanism.
From http://refspecs.linuxbase.org/FHS_2.3/fhs-2.3.html#PURPOSE6 :
"The /etc hierarchy contains configuration files. A 'configuration file' is a local file used to control the operation of a program; it must be static and cannot be an executable binary."
The existence of exceptions to this is simply a result of decades of organic growth, before any standards existed (even POSIX). The subjectiveness of system administrators' expectation that /etc does not normally contain executables does not reduce the legitimacy of the expectation. Following common expectations, even loosely subjective ones, helps system administrators do their jobs.
You are talking about the happy cases (users caring about what was written on the topic, etc.) while I am about the pessimistic scenarios. And for these, there's next to no difference, except that with proper tooling support, it may be immediately clear that /etc override is what's in use (cf. hidden in-situ changes in the agents).
If a user directly modifies a script deployed by an OS package, the next OS update of that package will overwrite it. That's an effective enforcement mechanism that quickly educates anyone who didn't pay attention to the documentation.
If an administrator or someone else troubleshooting a cluster problem wants to look at the resource agent code, they're going to go to the standard location first. If the behavior doesn't fit the code they see, they'll just get confused. There won't be any obvious indication that there's an override.
What I had in mind, say, a bunch of VMs could share the same /usr so as to save space.
That's feasible regardless of where custom agents are, and regardless of whether users can override an existing provider or require a unique provider name.
Also, this is a security risk, not a mitigation. Being able to write a script into /etc that is automatically run as root without having to touch the pacemaker configuration destroys any security gained by mounting /usr read-only. I don't follow. How is a normal user privileged to write to /etc?
The point of mounting /usr read-only is to disallow root from writing to it. The vulnerability is to exploits that allow only writing files as root, as opposed to full shell access. If the attacker can replace a common command with a trojan, it will end up being executed. Allowing that same attacker to write an OCF override to /etc, and having pacemaker automatically run it without any configuration change required, provides a way around a read-only /usr.
Existing scripts under /etc could be attacked in the same way, which is a good reason why they shouldn't be there, and are there only for historical reasons. From a security standpoint, mounting /usr read-only is stronger when paired with all other filesystems being mounted ro and/or noexec. As an example, Gentoo recommends mounting /etc read-only as well, with symlinks for files that need to be updated:
https://wiki.gentoo.org/wiki/Filesystem/Security#Mount_options
The bottom line from a security standpoint is that all executables should be on read-only partitions, otherwise the protection is only partial. (This is one reason this is not a common setup.)
On 28/11/17 00:17 +0000, Ken Gaillot wrote:
Also, this is a security risk, not a mitigation. Being able to write a script into /etc that is automatically run as root without having to touch the pacemaker configuration destroys any security gained by mounting /usr read-only. I don't follow. How is a normal user privileged to write to /etc?
The point of mounting /usr read-only is to disallow root from writing to it
One of them, not all!
The vulnerability is to exploits that allow only writing files as root, as opposed to full shell access. If the attacker can replace a common command with a trojan, it will end up being executed. Allowing that same attacker to write an OCF override to /etc,
But when you can write /etc/{passwd,shadow} amongst others, it's a case lost already!!!
and having pacemaker automatically run it without any configuration change required, provides a way around a read-only /usr.
Where did I say doing that to /usr primarily for security against intruders? The main idea is to separate domains of distributor-provided files and admin-delivered, prioritized ones, and this applies regardless if /usr is set immutable or not. But this scheme comes useful there just as well. And your linked, dated FHS (as opposed to originally mentioned LSB), also makes it clear that /etc is unshareable (localized) companion of /usr that is shareable (for that happening, it needs to be "locked-down" in some way), i.e. one of the other use cases I have in mind.
Existing scripts under /etc could be attacked in the same way, which is a good reason why they shouldn't be there
Viz /etc/{passwd,shadow}...
and are there only for historical reasons.
Speculation.
From a security standpoint, mounting /usr read-only is stronger when paired with all other filesystems being mounted ro and/or noexec. As an example, Gentoo recommends mounting /etc read-only as well, with symlinks for files that need to be updated:
https://wiki.gentoo.org/wiki/Filesystem/Security#Mount_options
But it doesn't tell to have /etc as noexec, likely for a reason, and I don't see it coming. And the same solution -- symlinks -- would apply here as well, really depends how paranoid the administrators want to go, but then, they would likely avoid resource-agents altogether because in case of some enabling vulnerability regarding resource manager, they will currently rather assist with arbitrary execution. This is in part covered by allowing one to narrow that surface by only allowing those really employed in the proposed dualism.
The bottom line from a security standpoint is that all executables should be on read-only partitions, otherwise the protection is only partial. (This is one reason this is not a common setup.)
Ok, if that's your opinion, let's also add an explicit provision that OCF scripts are not necessarily executable, in which case the resource manager is responsible to parse and interpret shebang on its own.
(Just teasing your fascination around "being executable" while it is more or less just a syntactical sugar, easily paralleled in user space, provided by the kernel in case of non-binaries. It's just a formalist's game if you realize that.)
-- Jan (Poki)