ansible icon indicating copy to clipboard operation
ansible copied to clipboard

node_exporter systemd unit file incorrectly formatted when using sysctl.include collector

Open 0xdeadbeefJERKY opened this issue 10 months ago • 1 comments

Bug Summary

Installing node exporter on an EC2 instance configured with the Amazon Linux 2 AMI (systemd 219) fails:

TASK [prometheus.prometheus.node_exporter : Ensure Node Exporter is enabled on boot] ***
fatal: [default]: FAILED! => {"changed": false, "msg": "Error loading unit file 'node_exporter': org.freedesktop.DBus.Error.InvalidArgs \"Invalid argument\""}

Here's the playbook being used:

- hosts: 127.0.0.1
  vars:
    node_exporter_enabled_collectors:
      - sysctl:
          include:
            vm:
              - overcommit_memory
              - overcommit_ratio
              - dirty_background_bytes
              - dirty_background_bytes
              - dirty_background_ratio
              - dirty_bytes
              - dirty_expire_centisecs
              - dirty_ratio
              - swappiness
  roles:
    - prometheus.prometheus.node_exporter

Upon further investigation, it appears the systemd unit file becomes malformed when attempting to wrap the sysctl.include collector in single quotes:

#
# Ansible managed
#

[Unit]
Description=Prometheus Node Exporter
After=network-online.target

[Service]
Type=simple
User=node-exp
Group=node-exp
ExecStart=/usr/local/bin/node_exporter \
    '--collector.sysctl.include={'vm': ['overcommit_memory', 'overcommit_ratio', 'dirty_background_bytes', 'dirty_background_bytes', 'dirty_background_ratio', 'dirty_bytes', 'dirty_expire_centisecs', 'dirty_ratio', 'swappiness']}' 

SyslogIdentifier=node_exporter
Restart=always
RestartSec=1
StartLimitInterval=0

ProtectHome=yes
NoNewPrivileges=yes

ProtectSystem=full

[Install]
WantedBy=multi-user.target

On line 14, you can see the single quote wrapping is prematurely terminated once it reaches 'vm'. More details can be found when checking the status of the service or using journalctl:

$ sudo systemctl status node_exporter
● node_exporter.service - Prometheus Node Exporter
   Loaded: error (Reason: Invalid argument)
   Active: failed (Result: resources) since Wed 2024-04-24 15:37:55 UTC; 20s ago
 Main PID: 2443 (code=killed, signal=KILL)

Apr 24 15:37:54 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service: main process exited, code=killed, status=9/KILL
Apr 24 15:37:54 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: Unit node_exporter.service entered failed state.
Apr 24 15:37:54 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service failed.
Apr 24 15:37:55 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service holdoff time over, scheduling restart.
Apr 24 15:37:55 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service failed to schedule restart job: Unit is not loaded properly: Invalid argument.
Apr 24 15:37:55 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: Unit node_exporter.service entered failed state.
Apr 24 15:37:55 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service failed.
Apr 24 15:38:12 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/node_exporter.service:13] Trailing garbage, ignoring.
Apr 24 15:38:12 ip-10-66-137-116.us-west-2.compute.internal systemd[1]: node_exporter.service lacks both ExecStart= and ExecStop= setting. Refusing.

Proposed Solution

This can be fixed by using double quotes for wrapping each collector argument being passed to node_exporter in the node_exporter.service.j2 template.

#
# Ansible managed
#

[Unit]
Description=Prometheus Node Exporter
After=network-online.target

[Service]
Type=simple
User=node-exp
Group=node-exp
ExecStart=/usr/local/bin/node_exporter \
    "--collector.sysctl.include={'vm': ['overcommit_memory', 'overcommit_ratio', 'dirty_background_bytes', 'dirty_background_bytes', 'dirty_background_ratio', 'dirty_bytes', 'dirty_expire_centisecs', 'dirty_ratio', 'swappiness']}" \

SyslogIdentifier=node_exporter
Restart=always
RestartSec=1
StartLimitInterval=0

ProtectHome=yes
NoNewPrivileges=yes
    
ProtectSystem=full

[Install]
WantedBy=multi-user.target

0xdeadbeefJERKY avatar Apr 24 '24 15:04 0xdeadbeefJERKY