ansible icon indicating copy to clipboard operation
ansible copied to clipboard

node_exporter role errors out while templating

Open christian-heusel opened this issue 1 year ago • 11 comments

fatal: [xerophyte]: FAILED! => {
    "changed": false,
    "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'systemd'. 'dict object' has no attribute 'systemd'"
}
Click to see full error (with -vvv enabled):
The full traceback is:
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ansible/template/__init__.py", line 1010, in do_template
    res = myenv.concat(rf)
          ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ansible/template/native_helpers.py", line 83, in ansible_concat
    return ''.join([to_text(v) for v in nodes])
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<template>", line 178, in root
  File "/usr/lib/python3.12/site-packages/ansible/template/__init__.py", line 295, in wrapper
    ret = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/jinja2/async_utils.py", line 45, in wrapper
    return normal_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/jinja2/filters.py", line 631, in sync_do_first
    return next(iter(seq))
                ^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/jinja2/runtime.py", line 852, in _fail_with_undefined_error
    raise self._undefined_exception(self._undefined_message)
jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'systemd'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ansible/plugins/action/template.py", line 152, in run
    resultant = templar.do_template(template_data, preserve_trailing_newlines=True, escape_backslashes=False, overrides=overrides)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ansible/template/__init__.py", line 1044, in do_template
    raise AnsibleUndefinedVariable(e, orig_exc=e)
ansible.errors.AnsibleUndefinedVariable: 'dict object' has no attribute 'systemd'. 'dict object' has no attribute 'systemd'

I think this is due to the usage of ansible_facts.packages.systemd here:

https://github.com/prometheus-community/ansible/blob/a3aaf709a5b758cde0c51e9f5e3f972906ac266d/roles/node_exporter/templates/node_exporter.service.j2#L63-L66

That is due to the fact that the packages fact for this host are empty:

$ ansible -m ansible.builtin.package_facts xerophyte
xerophyte | SUCCESS => {
    "ansible_facts": {
        "packages": {}
    },
    "changed": false
}

So do you know how this edgecase could occur and how it could be fixed?

christian-heusel avatar Apr 24 '24 17:04 christian-heusel

What distro are you on?

gardar avatar Apr 24 '24 22:04 gardar

What distro are you on?

I am using Arch Linux with ansible 9.5.1-1 / ansible-core 2.16.6-2

christian-heusel avatar Apr 24 '24 22:04 christian-heusel

Is that the target system or your control host? The package_facts module supports pacman so you should get the package facts on Arch

gardar avatar Apr 25 '24 00:04 gardar

This is the control host, the target system is a debian bookworm server.

christian-heusel avatar Apr 25 '24 00:04 christian-heusel

Hmm, debian bookworm should also definitely work.

Does the https://github.com/prometheus-community/ansible/blob/196bd2fd96840b3e62527a447e54c9975d33a964/roles/node_exporter/tasks/preflight.yml#L19-L21 task run on that host?

gardar avatar Apr 25 '24 00:04 gardar

Yes it also works on the other 4 bookworm hosts that are part of the playbook.

Does the "Gather package facts" task run on that host?

Yes that is run but gives an empty result as shown above (output from the other hosts stripped):

TASK [prometheus.prometheus.node_exporter : Gather package facts] **************
task path: /home/chris/.ansible/collections/ansible_collections/prometheus/prometheus/roles/node_exporter/tasks/preflight.yml:19
Using module file /usr/lib/python3.12/site-packages/ansible/modules/package_facts.py
Pipelining is enabled.
<xerophyte.teleport.mathphys.info> ESTABLISH SSH CONNECTION FOR USER: root
Using module file /usr/lib/python3.12/site-packages/ansible/modules/package_facts.py
<xerophyte.teleport.mathphys.info> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=1800s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o 'ControlPath="/tmp/ansible-%h-%p-%r"' xerophyte.teleport.mathphys.info '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<xerophyte.teleport.mathphys.info> (0, b'\n{"ansible_facts": {"packages": {}}, "invocation": {"module_args": {"manager": ["auto"], "strategy": "first"}}}\n', b'')
ok: [xerophyte] => {
    "ansible_facts": {
        "packages": {}
    },
    "changed": false,
    "invocation": {
        "module_args": {
            "manager": [
                "auto"
            ],
            "strategy": "first"
        }
    }
}

This seems to be a bug in the builtin module tho 🤔 When I specify manager=apt instead of auto I'm getting the correct list:

$ ansible -m ansible.builtin.package_facts -a "manager=apt" xerophyte                       130 ↵
xerophyte | SUCCESS => {
    "ansible_facts": {
        "packages": {
            "aapt": [
                {
                    "arch": "amd64",
                    "category": "devel",
                    "name": "aapt",
                    "origin": "Debian",
                    "source": "apt",
                    "version": "1:10.0.0+r36-10"
                }
            ],
            "abootimg": [
                {
                    "arch": "amd64",
                    "category": "admin",
                    "name": "abootimg",
                    "origin": "Debian",
                    "source": "apt",
                    "version": "0.6-1+b2"
                }
            ],
[...]

christian-heusel avatar Apr 25 '24 10:04 christian-heusel

Interesting, I wonder if it's attempting to use the wrong package manager when it's set to auto

As a workaround, perhaps you could enforce the apt manager with module_defaults in your playbook

gardar avatar Apr 25 '24 12:04 gardar

I have now opened an upstream bug to see whats causing this behaviour for the host in question 👍🏻

christian-heusel avatar Apr 25 '24 13:04 christian-heusel

They have now implemented something to allow for this to work properly (see https://github.com/ansible/ansible/issues/83143#issuecomment-2077336248):

- ansible.builtin.package_facts:
    manager: "{{ ansible_facts.pkg_mgr }}"

As soon as this is rolled out it could be used here aswell 😊

Are you happy with the way they "fixed" the issue? I think the default is still broken 😆

christian-heusel avatar May 08 '24 09:05 christian-heusel

Oh that sounds like a mess, I thought the auto was using the package manager from the facts already. Implementing a package manager detection in two places feels unnecessary. 🤔

gardar avatar May 08 '24 10:05 gardar

Feel free to comment there, I think it could be either made the default or something like manager=from_facts or similar 🤔

christian-heusel avatar May 08 '24 10:05 christian-heusel

Looks like this has been fixed in Ansible https://github.com/ansible/ansible/pull/83149

gardar avatar Oct 17 '24 14:10 gardar