community.general icon indicating copy to clipboard operation
community.general copied to clipboard

zfs: fix multi-line value in user-defined property

Open batrla opened this issue 2 years ago • 33 comments

SUMMARY

Fixes ZFS module which breaks if ZFS dataset property contains multi-line value. The root cause is a bug in zfs.py that causes parsing error of 'zfs get' output. The ZFS module uses 'zfs get -H -p -o property,value,source all dataset' to retrieve from dataset list of its property names, their values and source (whether the property was set locally or inherited for example). However, the problem is the output in value field can contain control characters like \n or \t. This causes ambiguity since the same characters are used to separate properties and fields on a line. The output of 'zfs get all' is basically not parseable as it contains values of multiple properties. One needs to use 'zfs get individual_property' to get single property value to avoid ambiguity.

Test case:

- name: basic configuration
  hosts: localhost

  tasks:
  - name: create ZFS with user defined property and multiline value
    zfs:
      name: rpool/myfs
      state: present
      extra_zfs_properties:
        my.custom.property:blah: "one\ntwo\n"

  - name: update user defined property multiline value
    zfs:
      name: rpool/myfs
      state: present
      extra_zfs_properties:
        my.custom.property:blah: "one\ntwo\nthree\nfour\n"
Without the fix
TASK [create ZFS with user defined property and multiline value] ***************

The full traceback is:
Traceback (most recent call last):

[ ... striping lines to make boring error output shorter ... ]

 line 295, in <module>\n  File \"/tmp/ansible_zfs_payload_xteg4tbc/ansible_zfs_payload.zip/ansible_collections/community/general/plugins/modules/zfs.py\", line 274, in main\n  File \"/tmp/ansible_zfs_payload_xteg4tbc/ansible_zfs_payload.zip/ansible_collections/community/general/plugins/modules/zfs.py\", line 196, in set_properties_if_changed\n  File \"/tmp/ansible_zfs_payload_xteg4tbc/ansible_zfs_payload.zip/ansible_collections/community/general/plugins/modules/zfs.py\", line 224, in get_current_properties\nValueError: not enough values to unpack (expected 3, got 2)\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1
}
With the fix
TASK [create ZFS with user defined property and multiline value] ***************
ok: [localhost] => {
    "changed": false,
    "diff": {
        "after": {
            "extra_zfs_properties": {
                "my.custom.property:blah": "one\ntwo\n"
            }
        },
        "after_header": "rpool/myfs",
        "before": {
            "extra_zfs_properties": {
                "my.custom.property:blah": "one\ntwo\nthree\nfour\n"
            }

^ this is because of test case re-run. The ZFS dataset initially had property value of  "one\ntwo\nthree\nfour\n"

        },
        "before_header": "rpool/myfs"
    },
    "invocation": {
        "module_args": {
            "extra_zfs_properties": {
                "my.custom.property:blah": "one\ntwo\n"
            },
            "name": "rpool/myfs",
            "origin": null,
            "state": "present"
        }
    },
    "my.custom.property:blah": "one\ntwo\n",
    "name": "rpool/myfs",
    "state": "present"
}

TASK [update user defined property multiline value] ****************************

ok: [localhost] => {
    "changed": false,
    "diff": {
        "after": {
            "extra_zfs_properties": {
                "my.custom.property:blah": "one\ntwo\nthree\nfour\n"
            }
        },
        "after_header": "rpool/myfs",
        "before": {
            "extra_zfs_properties": {
                "my.custom.property:blah": "one\ntwo\n"
            }
        },
        "before_header": "rpool/myfs"
    },
    "invocation": {
        "module_args": {
            "extra_zfs_properties": {
                "my.custom.property:blah": "one\ntwo\nthree\nfour\n"
            },
            "name": "rpool/myfs",
            "origin": null,
            "state": "present"
        }
    },
    "my.custom.property:blah": "one\ntwo\nthree\nfour\n",
    "name": "rpool/myfs",
    "state": "present"
}
Result

When test case with fix completes, multi-line ZFS property is set:

$ zfs get all rpool/myfs | tail
rpool/myfs  version                  6                      -
rpool/myfs  vscan                    off                    default
rpool/myfs  writelimit               default                default
rpool/myfs  xattr                    on                     default
rpool/myfs  zoned                    off                    default
rpool/myfs  my.custom.property:blah  one
two
three
four
    local

batrla avatar Mar 29 '23 19:03 batrla

cc @bcoca @fishman @jasperla @johanwiren @jpdasma @mator @scathatheworm @troy2914 @xen0l click here for bot help

ansibullbot avatar Mar 29 '23 19:03 ansibullbot

The test ansible-test sanity --test pep8 [explain] failed with 2 errors:

plugins/modules/zfs.py:239:12: E713: test for membership should be 'not in'
plugins/modules/zfs.py:253:1: E302: expected 2 blank lines, found 1

The test ansible-test sanity --test pep8 [explain] failed with 2 errors:

plugins/modules/zfs.py:239:12: E713: test for membership should be 'not in'
plugins/modules/zfs.py:253:1: E302: expected 2 blank lines, found 1

The test ansible-test sanity --test pep8 [explain] failed with 2 errors:

plugins/modules/zfs.py:239:12: E713: test for membership should be 'not in'
plugins/modules/zfs.py:253:1: E302: expected 2 blank lines, found 1

The test ansible-test sanity --test pep8 [explain] failed with 2 errors:

plugins/modules/zfs.py:239:12: E713: test for membership should be 'not in'
plugins/modules/zfs.py:253:1: E302: expected 2 blank lines, found 1

click here for bot help

ansibullbot avatar Mar 29 '23 19:03 ansibullbot

Sure, np, just added a changelog fragment. I just didn't know about it.

batrla avatar Mar 30 '23 07:03 batrla

Docs Build 📝

Thank you for contribution!✨

The docsite for this PR is available for download as an artifact from this run: https://github.com/ansible-collections/community.general/actions/runs/4563994550

File changes:

  • M collections/community/general/pipx_module.html
Click to see the diff comparison.

NOTE: only file modifications are shown here. New and deleted files are excluded. See the file list and check the published docs to see those files.

diff --git a/home/runner/work/community.general/community.general/docsbuild/base/collections/community/general/pipx_module.html b/home/runner/work/community.general/community.general/docsbuild/head/collections/community/general/pipx_module.html
index 49e81e6..27f06b7 100644
--- a/home/runner/work/community.general/community.general/docsbuild/base/collections/community/general/pipx_module.html
+++ b/home/runner/work/community.general/community.general/docsbuild/head/collections/community/general/pipx_module.html
@@ -195,7 +195,7 @@
 <a class="ansibleOptionLink" href="#parameter-force" title="Permalink to this option"></a><p class="ansible-option-type-line"><span class="ansible-option-type">boolean</span></p>
 </div></td>
 <td><div class="ansible-option-cell"><p>Force modification of the application’s virtual environment. See <code class="docutils literal notranslate"><span class="pre">pipx</span></code> for details.</p>
-<p>Only used when <em>state=install</em>, <em>state=upgrade</em>, <em>state=upgrade_all</em>, or <em>state=inject</em>.</p>
+<p>Only used when <em>state=install</em>, <em>state=upgrade</em>, <em>state=upgrade_all</em>, <em>state=latest</em>, or <em>state=inject</em>.</p>
 <p class="ansible-option-line"><span class="ansible-option-choices">Choices:</span></p>
 <ul class="simple">
 <li><p><code class="ansible-option-default-bold docutils literal notranslate"><span class="pre">false</span></code> <span class="ansible-option-choices-default-mark">← (default)</span></p></li>
@@ -208,7 +208,8 @@
 <a class="ansibleOptionLink" href="#parameter-include_injected" title="Permalink to this option"></a><p class="ansible-option-type-line"><span class="ansible-option-type">boolean</span></p>
 </div></td>
 <td><div class="ansible-option-cell"><p>Upgrade the injected packages along with the application.</p>
-<p>Only used when <em>state=upgrade</em> or <em>state=upgrade_all</em>.</p>
+<p>Only used when <em>state=upgrade</em>, <em>state=upgrade_all</em>, or <em>state=latest</em>.</p>
+<p>This is used with <em>state=upgrade</em> and <em>state=latest</em> since community.general 6.6.0.</p>
 <p class="ansible-option-line"><span class="ansible-option-choices">Choices:</span></p>
 <ul class="simple">
 <li><p><code class="ansible-option-default-bold docutils literal notranslate"><span class="pre">false</span></code> <span class="ansible-option-choices-default-mark">← (default)</span></p></li>
@@ -221,7 +222,7 @@
 <a class="ansibleOptionLink" href="#parameter-index_url" title="Permalink to this option"></a><p class="ansible-option-type-line"><span class="ansible-option-type">string</span></p>
 </div></td>
 <td><div class="ansible-option-cell"><p>Base URL of Python Package Index.</p>
-<p>Only used when <em>state=install</em>, <em>state=upgrade</em>, or <em>state=inject</em>.</p>
+<p>Only used when <em>state=install</em>, <em>state=upgrade</em>, <em>state=latest</em>, or <em>state=inject</em>.</p>
 </div></td>
 </tr>
 <tr class="row-odd"><td><div class="ansible-option-cell">
@@ -251,7 +252,7 @@
 <a class="ansibleOptionLink" href="#parameter-install_deps" title="Permalink to this option"></a><p class="ansible-option-type-line"><span class="ansible-option-type">boolean</span></p>
 </div></td>
 <td><div class="ansible-option-cell"><p>Include applications of dependent packages.</p>
-<p>Only used when <em>state=install</em>, <em>state=upgrade</em>, or <em>state=inject</em>.</p>
+<p>Only used when <em>state=install</em>, <em>state=latest</em>, <em>state=upgrade</em>, or <em>state=inject</em>.</p>
 <p class="ansible-option-line"><span class="ansible-option-choices">Choices:</span></p>
 <ul class="simple">
 <li><p><code class="ansible-option-default-bold docutils literal notranslate"><span class="pre">false</span></code> <span class="ansible-option-choices-default-mark">← (default)</span></p></li>
@@ -279,7 +280,7 @@
 <a class="ansibleOptionLink" href="#parameter-python" title="Permalink to this option"></a><p class="ansible-option-type-line"><span class="ansible-option-type">string</span></p>
 </div></td>
 <td><div class="ansible-option-cell"><p>Python version to be used when creating the application virtual environment. Must be 3.6+.</p>
-<p>Only used when <em>state=install</em>, <em>state=reinstall</em>, or <em>state=reinstall_all</em>.</p>
+<p>Only used when <em>state=install</em>, <em>state=latest</em>, <em>state=reinstall</em>, or <em>state=reinstall_all</em>.</p>
 </div></td>
 </tr>
 <tr class="row-odd"><td><div class="ansible-option-cell">

github-actions[bot] avatar Mar 30 '23 07:03 github-actions[bot]

CC @froebela @sam-lunt @hobnob11 you did fixes to this modules in the past, maybe you could take a look at this one.

felixfontein avatar Apr 02 '23 08:04 felixfontein

Personally, I don't think this is a change that should be made, for two reasons.

First, it's going to be much slower. Currently, there is one call to zfs get: zfs get -H -p -o property,value,source, which retrieves the list of properties as well as their values. With this change, there will be N + 1 calls to zfs get: one to get the list of properties, then one call per property to retrieve the value. This is clearly less efficient and introduces some potential corner cases if the properties are somehow modified concurrently by another process.

However, the bigger issue is that this is supporting a use case that I don't think that ZFS really intends to support, namely property names or values that contain special characters \t and/or \n. Quoting from the zfs-get man page:

-H Display output in a form more easily parsed by scripts. Any headers are omitted, and fields are explicitly separated by a single tab instead of an arbitrary amount of space.

The fact that fields are tab-delimited and records are newline-delimited pretty clearly indicates to me that the ZFS maintainers don't intend or expect property names/values to include those characters. I don't think it makes sense to increase code complexity and worsen performance just to support a use case that almost no one uses and that the ZFS maintainers don't seem to intentionally support.

@batrla, you don't mention the use case you have that requires embedding newlines in your property values, but I'd suggest trying to refactor your logic to avoid literal tabs or newlines in your property names/values. (Python makes this easy using the str.encode("unicode_escape") feature, for example).

sam-lunt avatar Apr 02 '23 17:04 sam-lunt

Hi @sam-lunt and thanks for having look at the code.

Current parsing in zfs.py is naive and evil administrator or common user (also evil) with ZFS delegated admin privileges can easily exploit it. The scope of problem I am trying to fix goes beyond user-defined properties and impacts property that allows path names, like e.g. the mountpoint property, where traditionally only \0 and '/' characters are not allowed. The mountpoint property provides whole range of options how to break zfs.py, either to break parsing by containing \t or \n in mountpoint value or by containing fake lines that look like output of zfs get to confuse zfs.py to think that property X has fake value of Y.

This is just an example that breaks current zfs.py:

zfs create -o mountpoint=$'/this\nbreaks\nansible' rpool/pokus

No user-defined property needed.

Yes, the suggested fix is less efficient as it has calls initially zfs get to find out all property names. Then it calls another zfs get to find value of a property, but that call is only done for modified properties (in playbook), so time complexity falls into O(n) range, where n is number of properties. I think it's reasonable, since zfs get single_prop is the only way to avoid ambiguity in parsing.

To your point of zfs man page, yes, I had look at it before. The description of -H and -p options is unfortunate, because it sets expectations and fails to deliver it. What zfs get is missing is quoting of control characters and characters that are used as separators in the output, but that's something we'll have to live with, since even if someone fixes it later, the history won't be rewritten (there would be still systems with old output format).

I don't think we have a luxury of not accepting this PR.

batrla avatar Apr 03 '23 07:04 batrla

I'm sorry for typo, I wrote '/' is not allowed in mountpoint, of course mountpoint property allows '/' by definition... I was thinking about a file name (path component) when writing it.

batrla avatar Apr 03 '23 07:04 batrla

OK, I hadn't realized this was meant to address a security concern. That's helpful context to have, thanks. That said, I don't think this is a particularly compelling concern, since the only way to write ZFS properties is to have root access (or to be granted them via zfs-delegate, as you mentioned, but I think anyone using that feature is aware that it's not something that should be done with an untrusted user).

Since the only way to construct a malicious property value requires elevated privileges, I don't think it's something worth worrying about. It's analogous to worrying about a malicious /root/.bashrc file: if someone malicious can write that file, then they won't bother to do so, since they can just directly do whatever actively malicious task they want

sam-lunt avatar Apr 03 '23 12:04 sam-lunt

Also, while Unix/Linux allow any character in a file path besides \0 and /, that's generally viewed as a too-lenient restriction and many common *nix utilities are more strict (e.g. entries of PATH cannot have : characters)

sam-lunt avatar Apr 03 '23 12:04 sam-lunt

Well, I can't agree with statement:

It's analogous to worrying about a malicious /root/.bashrc

ZFS delegated admin can add privileges to unprivileged users to set user properties on filesystems they create, but have no permissions to touch datasets of other users. This is can be done recursively and even on a public machine. The administrator (super-user) might think that he securely allowed ZFS delegated admin permissions and had no knowledge malicious user could break his ansible stuff running on some other host... It's not like /root/.bashrc case. Instead it's more like /tmp directory with a+rwxt permission where user can store his files but cannot touch / remove other's files. Then an unprivileged user creates some 'weird files' in /tmp directory and waits if some script run by root traversing /tmp is buggy or not. Whether you want to address this parsing bug in zfs.py in community.general is something I don't know...

batrla avatar Apr 03 '23 13:04 batrla

Do you have an example of this kind of exploit being used/attempted "in the wild"? I don't get the impression that zfs-allow is a frequently used feature, especially not on Linux (since it can't allow mount or unmount).

Secondly, I don't think there is a particular danger here, even if we do allow the possibility of malicious users having been delegated. The malicious user can create fake properties or override the values of existing properties, but that can't be used for a privilege escalation. At most it can cause an extra call to zfs set that wouldn't have been made otherwise or a call to zfs set to be skipped. But in this scenario, the malicious user can already call zfs set themselves, so they don't need to trick Ansible into doing it for them.

I think that trying to address security concerns has a much higher bar. There needs to be a proof-of-concept that shows there really is a security vulnerability, and there needs to be confidence that any fix address the security vulnerability and doesn't create any new security holes in the process. Finally, if there is a vulnerability, the question should be asked of whether it is Ansible that should address it or ZFS. It seems to be that if this really is a risk (which I'm not yet convinced is the case), then it is likely something that ZFS should address, perhaps with a format that uses null characters as the delimiters.

sam-lunt avatar Apr 03 '23 13:04 sam-lunt

No, I don't have particular example of exploit, but I may create one if it's desired. While most of ZFS installation won't be using ZFS delegated admin permissions, it doesn't mean there are no sophisticated deployments out there using it.

but that can't be used for a privilege escalation

yes, I would agree. At first glance I see no risk of privilege escalation. On the other hand if ansible is used to control something and relies on status of zfs module, then there might be risk of denial of service.

then it is likely something that ZFS should address, perhaps with a format that uses null characters as the delimiters.

Yes and no. Yes for new ZFS enhancement for new output format that allows printing multiple properties in a safe way that can be parsed by other programs. And "no" for ansible, as I mentioned in my second comment, we can't rewrite history, if there are already systems using old format and ansible controls them, then it can't be done without breaking compatibility. Alternative fix is to use some python binding to libzfs and get properties directly from libzfs but I'm unsure how simple it is.

batrla avatar Apr 03 '23 13:04 batrla

Thanks, an example of an exploit would be great. I think that would make this discussion more concrete.

sam-lunt avatar Apr 03 '23 14:04 sam-lunt

nd "no" for ansible, as I mentioned in my second comment, we can't rewrite history, if there are already systems using old format and ansible controls them, then it can't be done without breaking compatibility.

You can always first figure out the version and then depending on that either use the new, faster method, or fall back to the old, slower one.

In any case, I agree this should be fixed, but the performance penality is bad indeed. How about making the behavior configurable, so that folks who care about speed (and trust their environment) can continue using the old way, while others can use the slow way if they are not sure (or if they don't specify anything - I would make it the new default)?

felixfontein avatar Apr 03 '23 19:04 felixfontein

The performance of this change is going to be pretty substatially slower. Here is a bash function you can use to benchmark yourself (I see about 20x difference between the two)

get_slow() { zfs get -H -o property all "$1" | while read -r prop; do zfs get -H -o property,value,source "$prop" "$1"; done }
get_fast() { zfs get -H -o property,value,source all "$1" }

Also, I'm not convinced that this is a true security issue. I suspect that any example will only be able to trick ansible into either making an unnecessary zfs set call or skipping a zfs set call it would have made otherwise. But since this presupposes that the malicious user has permissions to call zfs set, I don't think there is any real vulnerability to address.

If there is a genuine vulnerability, I think that OpenZFS is probably the more appropriate place to address it, since anything that uses zfs get -H would be affected, not just Ansible.

sam-lunt avatar Apr 03 '23 19:04 sam-lunt

@sam-lunt I don't think your example is valid. The suggested fix defers retrieval of value until it's really needed. It doesn't cache all properties in advance. It only reads the property if task wants to set it to particular value. If the task is not using properties, then the suggested fix could theoretically be even faster than current code (probably not) because it doesn't read values in initial zfs get (only names and source). Your example reads all properties...

Other point about making the behavior configurable - if this is something we want, there's a magic way to do it automatically. Whether to use old or new way of reading properties. First, run first zfs get all to learn property names (only), then run second zfs get all to learn property values. Check how many properties were read each run, whether parsing was OK and whether no property was set twice. If there's no issue, the values can be trusted. If not, revert to new way - read individual zfs properties one by one in deferred manner. This deals with race conditions and security too at cost of 2x zfs get all and avoids "tunable" as everybody knows that tunables are evil.

batrla avatar Apr 03 '23 20:04 batrla

needs_info

felixfontein avatar Feb 23 '24 20:02 felixfontein

Hello, I noticed tag "needs_info" in this PR. What information is missing? I think I provided a lot of details. If anything is unclear, just ask, I'm happy to answer.

One thing I want to make clear - I will make no tradeoffs between performance and security (and correctness).

Correctness and security is (I think) super-ordinate to performance. Remember the Pentium FDIV bug in CPU:

does it matter if the result is wrong? The result was computed so quickly...

(that was irony, sorry).

batrla avatar Mar 18 '24 09:03 batrla

As you have noticed we will not merge your current version. If you insist on getting this merged as-is (without addressing the issues that were pointed out), this won't get merged. Sorry.

felixfontein avatar Mar 24 '24 21:03 felixfontein

I really don't insist on getting anything merged. I just wanted to help by making an opensource product more secure (and less error prone) for anyone without having to patch it on my side.

Btw: what issues were pointed out in my code? I just see that an author of the original code is trying to defend it by posting a comment that N x "zfs get" (what is N?) are 20x slower than "zfs get all"... How this argument is related to my code? Speed of ZFS cli is matter of ZFS implementation, it's unrelated to this PR. Another question is how N is set in the comparison. The original code retrieves "all" ZFS properties for given dataset, while my code only those properties that are really needed.

Anyway, it looks like your priorities are different than security on first place, so I don't want to waste your time (and my time), please close the PR.

batrla avatar Mar 25 '24 07:03 batrla

I'll point out that I'm not the "author of the original code", simply someone who has had a PR accepted for this module in the past, so I don't think it's fair to imply that my issues with your code stemmed from ego or some need to defend myself.

To recap, my concern is that you are imposing a real performance penalty in order to address an ill-defined and potentially imaginary security vulnerability. I think at minimum, having a reproducing example of such a security exploit is needed. That's both to demonstrate that there really is a problem as well as that your change solves the problem. And if you don't think the performance example I provided is valid because it parses the output of zfs get using bash instead of python, please feel free to write it in python instead. Or, even better, create an ansible script that uses the code affected by your change and benchmark that. Doing that would allow any performance penalty to be contextualized with all the other work that ansible needs to do. Maybe those costs will dominate and make any performance penalty from your change inconsequential.

sam-lunt avatar Mar 25 '24 13:03 sam-lunt

I wish you could demonstrate the performance penalty imposed by suggested fix in a real world use case scenario. I wonder if it makes any impact on Ansible users. I keep run my modified version of zfs.py for months to provision over dozen machines and haven't notice any performance issue.

Regarding:

performance penalty in order to address an ill-defined and potentially imaginary security vulnerability

...it's is just a part of the story. You know that zfs.py uses an API that was apparently designed to output human readable data, but was not designed to retrieve machine parse-able data. Relying on this API is a bug in zfs.py and the fact that it works for you doesn't change that it is still a bug. Yes if you think it's safe to ignore the bug (well maybe I'm the only one who did hit this problem...) you can just document that there's a limitation in zfs.py, it doesn't support multi-line values, because ZFS cli uses '\n' as a delimiter for properties. That's it.

But again, I don't get the complaint about performance. The Ansible playbooks I have written are "huge" (or maybe just big), they run for 20-30 minutes and I don't care if a task spends few extra millisecond invoking a cli that can safely retrieve multi-line properties from ZFS. Also I don't think Ansible is like a network stack in the OS that needs to save every cpu instruction, reduce number of branches, limit access to data that would result in cache-miss and so on. How many ZFS files systems and properties do you set in your playbooks? (rhetorical question)

That's my view. On the other hand I respect your decision to not merge the PR. I'm fine with it and not going to do any performance analysis between something that's broken and something that works.

P.S.: sorry for "author of the original code"

batrla avatar Mar 25 '24 15:03 batrla

The big point that you aren't addressing is an example of a vulnerability that is corrected by your change. Without that, how can you be sure that your change correctly addresses the problem it's intended to? (This would also help quantify what kinds of systems might be vulnerable to this issue)

The Ansible playbooks I have written are "huge" (or maybe just big), they run for 20-30 minutes and I don't care if a task spends few extra millisecond invoking a cli that can safely retrieve multi-line properties from ZFS.

That's your use case, but someone else might use ansible solely for local operations that they expect to complete fairly quickly. I provided a short script that shows there is a decent chance your change will have a non-insignificant performance impact. Maybe you don't think that script is representative of your change's performance, but I think at this point, the ball is in your court to refute that. Also, my argument isn't that "performance trumps security", but that if the performance cost is significant, then it might be worth putting the time in to find a solution that addresses both issues. That's a hard discussion to have without an example of exploiting the vulnerability and without a benchmark that you consider "fair".

sam-lunt avatar Mar 25 '24 15:03 sam-lunt

expect to complete fairly quickly. What's your expectation, microsecond, millisecond?

I wonder what is the cost of fork() and exec() on your system....

Let's take an example from zfs.py source code, see EXAMPLES.

- name: Create a new file system called myfs2 with snapdir enabled
  community.general.zfs:
    name: rpool/myfs2
    state: present
    extra_zfs_properties:
      snapdir: enabled

The orig code would (in relevant part) do:

$ time /bin/bash -c '/sbin/zfs get -H -o property all "$1"' -- rpool > /dev/null
real    0m0.020s
user    0m0.006s
sys     0m0.012s

The new code would (in relevant part) do:

$ time /bin/bash -c '/sbin/zfs get -H -o property all "$1"; for a in snapdir; do /sbin/zfs get -H -o property,value,source "$a" "$1"; done' -- rpool > /dev/null
 
real    0m0.030s
user    0m0.010s
sys     0m0.017s

Result

The suggested fix in for task that sets one extra ZFS property for given example slowed down the execution by 10 milliseconds (which accidentally is usual time quantum for kernel scheduler). If the tasks in playbook sets N extra ZFS properties, then the overhead of forking a new process, executing a (zfs) binary, setting up libraries and so on needs to be multiplied N times. The suggested fix increases the time complexity of this particular part of zfs.py from O(1) to O(n) where n is number of extra ZFS properties. In real world, I assume that the penalty of the fix compared to execution time of zfs.py for N <= 10 is going to be irrelevant. User who set thousands of (user) extra ZFS properties on a dataset will be impacted, they may observe delays in seconds. Users with no extra ZFS property won't see a difference. Users with 10 extra ZFS properties may see penalty of 0.1s. These are just examples. The actual numbers depend on system, machine and ZFS state.

P.S.: note your get_slow() contains syntax error, the shell one liner is missing semicolon before }

batrla avatar Mar 25 '24 16:03 batrla

I don't think your examples accurately reflect what calls need to be made. Look at lines 205-207 from your branch:

        updated_properties = self.list_properties()
        for prop in self.properties:
            value = self.get_property(prop, updated_properties)

list_properties returns all non-inherited properties, then you call zfs get for each one. So it seems that your example that calls zfs get for only the values in extra_zfs_properties is overly optimistic.

In any case, let's put aside the performance question and focus on the meat of the issue. You say that you are addressing a security concern, but you don't have an example that exploits the current code and that shows that your change fixes the exploit. How can you be sure that your code correctly addresses the issue without some sort of test?

sam-lunt avatar Mar 25 '24 17:03 sam-lunt

list_properties returns all non-inherited properties, then you call zfs get for each one. So it seems that your example that calls zfs >get for only the values in extra_zfs_properties is overly optimistic.

This reading doesn't look correct. The list_properties() returns all properties and result is assigned to variable updated_properties. However, the code doesn't iterate over all properties (in updated_properties), it iterates over self.properties:

        for prop in self.properties:
            value = self.get_property(prop, updated_properties)

The list of all properties is not used in iteration, it is used to provide supplementary information whether the property exists or not. The code iterates over self.properties which is a list that comes from an argument to Zfs class constructor and that one is fed from:

zfs = Zfs(module, name, module.params['extra_zfs_properties'])

In any case, let's put aside the performance question ...

No, the objection about performance is the biggest argument against the fix. Is the objection no longer relevant?

Is the whole issue because of you and me read the code differently?

batrla avatar Mar 25 '24 17:03 batrla

For the record:

the new code is in this PR is actually faster than the old code (hypothetically) if you don't use extra ZFS properties, because the new code doesn't retrieve values from zfs get all, it only retrieves the list. The values are retrieved only when needed, so if not needed, the new code saves moving data around the pipe from /sbin/zfs.

P.S.: If we wait 4 days, we can "celebrate" 1 year anniversary of arguing about few milliseconds.

batrla avatar Mar 25 '24 18:03 batrla

No, the objection about performance is the biggest argument against the fix. Is the objection no longer relevant?

My initial comment was partly about performance, yes, and it seems that you're right that your change only adds a zfs get call for each element extra_zfs_properties, not each property defined on the dataset. However, in that same comment, I said that I don't think that embedding newlines or tabs in property values is a use case that ZFS intends to support, and I stand by that.

I think when we started to talk past each other was after you said that this was about addressing a security vulnerability, not adding a new feature. I was trying to say "well, I'm worried about the performance, but I'd also like to nail down exactly what kind of exploit you are worried about", but I think you mostly focused on the "performance" part of that.

All that said, I don't really think this is a security concern, it's supporting a new feature (ZFS property values with embedded newlines). I think the bar for adding a feature is lower than addressing a security vulnerability, so I wouldn't object to merging this. I am dubious about supporting this feature, but I don't think you are adding too much special-casing in order to do so, so it's probably fine

sam-lunt avatar Mar 25 '24 19:03 sam-lunt

@sam-lunt, let's make a recap:

  1. Your reading of the python code was incorrect and your concern about performance was invalid. You still have a chance to apologize for it, because it blocked this PR for almost a year. My code doesn't iterate over all possible ZFS properties (unless the user wants it) and doesn't make Ansible intentionally slower.

  2. Your following comment is not relevant, neither you nor me can change what ZFS people did in past (though maybe you are right or maybe ZFS folks just didn't have time to introduce machine-readable format to zfs get, I don't know):

    I don't think that embedding newlines or tabs in property values is a use case that ZFS intends to support, and I stand by that.

  3. Regarding the security concern, I'm not 100% sure this is real security issue. Maybe no, maybe yes. if yes, I would not post an exploit here in public. Because of ZFS Delegated Admin feature, the current source code in zfs.py can process properties and values defined by common user. This processing is ambiguous since the delimiter is valid character of property value can. I think zfs.py meets criteria for security weakness:

CWE-20: Improper Input Validation as defined in: https://cwe.mitre.org/data/definitions/20.html

I will reply to your source code comments by end of next week. Thanks,

batrla avatar Mar 27 '24 09:03 batrla