salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] Service not restarted despite of watching its service file

Open vincent-olivert-riera opened this issue 2 years ago • 8 comments

Description

  • I have a file.managed task to deploy a systemd service unit file.
  • I have a service.running task with a watch for the previous task.
  • Salt does report changes on the service unit file, however the service is not restarted.

Setup

node_exporter:
  pkg.installed:
    - pkgs:
      - golang-github-prometheus-node-exporter
  file.managed:
    - name: /etc/systemd/system/node_exporter.service
    - source: salt://node_exporter/files/node_exporter.service
  service.running:
    - enable: true
    - watch:
      - pkg: node_exporter
      - file: node_exporter

Steps to Reproduce the behavior

[root@salt-master ~]# salt salt-master.mydomain.com state.apply node_exporter
salt-master.mydomain.com:
----------
          ID: node_exporter
    Function: pkg.installed
      Result: True
     Comment: 1 targeted package was installed/updated.
     Started: 16:25:13.246211
    Duration: 54466.506 ms
     Changes:
              ----------
              golang-github-prometheus-node-exporter:
                  ----------
                  new:
                      1.2.2-1.el7
                  old:
----------
          ID: node_exporter
    Function: file.managed
        Name: /etc/systemd/system/node_exporter.service
      Result: True
     Comment: File /etc/systemd/system/node_exporter.service updated
     Started: 16:26:07.738420
    Duration: 559.651 ms
     Changes:
              ----------
              diff:
                  ---
                  +++
                  @@ -7,7 +7,7 @@
                   Type=simple
                   NotifyAccess=all
                   User=prometheus
                  -ExecStart=/opt/node_exporter-1.3.1.linux-amd64/node_exporter \
                  +ExecStart=/usr/sbin/node_exporter \
                   --collector.textfile.directory=/var/lib/node_exporter

                   [Install]
----------
          ID: node_exporter
    Function: service.running
      Result: True
     Comment: Service node_exporter has been enabled, and is in the desired state
     Started: 16:26:08.545219
    Duration: 4213.354 ms
     Changes:
              ----------
              node_exporter:
                  True

Summary for salt-master.mydomain.com
------------
Succeeded: 3 (changed=3)
Failed:    0
------------
Total states run:     3
Total run time:  59.721 s

Expected behavior Since the service.running task watches the file.managed one, and the later is reporting changes, I was expecting the service to be restarted.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3005.1
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.8.2
     docker-py: 2.6.1
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.11.1
       libgit2: Not Installed
      M2Crypto: 0.35.2
          Mako: Not Installed
       msgpack: 0.6.1
  msgpack-pure: Not Installed
  mysql-python: 1.3.12
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: Not Installed
        pygit2: Not Installed
        Python: 3.6.8 (default, Apr  2 2020, 13:34:55)
  python-gnupg: Not Installed
        PyYAML: 3.13
         PyZMQ: 18.0.1
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.1.4
 
System Versions:
          dist: centos 7 Core
        locale: UTF-8
       machine: x86_64
       release: 3.10.0-1127.19.1.el7.x86_64
        system: Linux
       version: CentOS Linux 7 Core

Additional context

  • Previously, Node Exporter was being installed by unpacking its tarball. That's why in the systemd service unit file the ExecStart command had /opt/node_exporter-1.3.1.linux-amd64/node_exporter.
  • Now, I'm installing Node Exporter using an RPM package, which places the executable file in /usr/sbin/node_exporter, hence the changes to the systemd service unit file.
  • I suspect the problem must be related to the installation of that package. For instance, if I install the package manually so the pkg.install task does not do anything, then Salt behaves correctly: it changes the service file and due to the watch it restarts the service. See:
[root@salt-master ~]# salt salt-master.mydomain.com state.apply node_exporter
salt-master.mydomain.com:
  Name: node_exporter - Function: pkg.installed - Result: Clean Started: - 11:22:23.643942 Duration: 11577.661 ms
----------
          ID: node_exporter
    Function: file.managed
        Name: /etc/systemd/system/node_exporter.service
      Result: True
     Comment: File /etc/systemd/system/node_exporter.service updated
     Started: 11:22:35.240404
    Duration: 345.054 ms
     Changes:
              ----------
              diff:
                  ---
                  +++
                  @@ -7,7 +7,7 @@
                   Type=simple
                   NotifyAccess=all
                   User=prometheus
                  -ExecStart=/opt/node_exporter-1.3.1.linux-amd64/node_exporter \
                  +ExecStart=/usr/sbin/node_exporter \
                   --collector.textfile.directory=/var/lib/node_exporter

                   [Install]
----------
          ID: node_exporter
    Function: service.running
      Result: True
     Comment: Service restarted
     Started: 11:22:39.215246
    Duration: 8927.417 ms
     Changes:
              ----------
              node_exporter:
                  True

Summary for salt-master.mydomain.com
------------
Succeeded: 3 (changed=2)
Failed:    0
------------
Total states run:     3
Total run time:  21.593 s

I'm not sure if this is relevant, but that RPM package has a bug: it installs a node_exporter.service file in /etc/systemd/system/, which clashes with the one that I'm deploying with the file.managed task. However, my file is not overwritten because rpm detects the conflict and installs the new file as node_exporter.service.rpmnew. But I don't know if that can confuse Salt in some way.

vincent-olivert-riera avatar Nov 02 '22 03:11 vincent-olivert-riera

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

welcome[bot] avatar Nov 02 '22 03:11 welcome[bot]

N.B. 3004.1 has security vulnerabilities. You should upgrade to 3004.2 or 3005.1

That is odd behaviour. Does it make any difference if you:

  • use - file: /etc/systemd/system/node_exporter.service?
  • split the states to multiple ids?

OrangeDog avatar Nov 02 '22 09:11 OrangeDog

@OrangeDog,

Thanks for your answer. I have tested your suggestions and they didn't make any difference.

use - file: /etc/systemd/system/node_exporter.service?

[root@salt-master ~]$ sudo salt -I role:salt-master state.apply prometheus.node_exporter
salt-master.mydomain.com:
          ID: node_exporter
    Function: pkg.installed
      Result: True
     Comment: 1 targeted package was installed/updated.
     Started: 10:26:27.263121
    Duration: 55846.108 ms
     Changes:
              ----------
              golang-github-prometheus-node-exporter:
                  ----------
                  new:
                      1.2.2-1.el7
                  old:
----------
          ID: node_exporter
    Function: file.managed
        Name: /etc/systemd/system/node_exporter.service
      Result: True
     Comment: File /etc/systemd/system/node_exporter.service updated
     Started: 10:27:23.125982
    Duration: 578.162 ms
     Changes:
              ----------
              diff:
                  ---
                  +++
                  @@ -7,7 +7,7 @@
                   Type=simple
                   NotifyAccess=all
                   User=prometheus
                  -ExecStart=/opt/node_exporter-1.3.1.linux-amd64/node_exporter \
                  +ExecStart=/usr/sbin/node_exporter \
                   --collector.textfile.directory=/var/lib/node_exporter

                   [Install]
----------
          ID: node_exporter
    Function: service.running
      Result: True
     Comment: Service node_exporter has been enabled, and is in the desired state
     Started: 10:27:23.913031
    Duration: 3923.308 ms
     Changes:
              ----------
              node_exporter:
                  True

Summary for salt-master.mydomain.com
------------
Succeeded: 3 (changed=3)
Failed:    0
------------
Total states run:     3
Total run time:  60.746 s

[root@salt-master ~]$ grep ExecStart /etc/systemd/system/node_exporter.service
ExecStart=/usr/sbin/node_exporter \

[root@salt-master ~]$ systemctl status node_exporter
● node_exporter.service - Daemon for Prometheus node_exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2022-11-07 10:22:56 JST; 5min ago
 Main PID: 160295 (node_exporter)
   CGroup: /system.slice/node_exporter.service
           └─160295 /opt/node_exporter-1.3.1.linux-amd64/node_exporter --collector.textfile.directory=/var/lib/node_exporter

[root@salt-master ~]$

split the states to multiple ids?

[root@salt-master ~]$ sudo salt -I role:salt-master state.apply prometheus.node_exporter
salt-master.mydomain.com:
          ID: node-exporter-package
    Function: pkg.installed
      Result: True
     Comment: 1 targeted package was installed/updated.
     Started: 10:41:47.141779
    Duration: 51619.591 ms
     Changes:
              ----------
              golang-github-prometheus-node-exporter:
                  ----------
                  new:
                      1.2.2-1.el7
                  old:
----------
          ID: node-exporter-service-file
    Function: file.managed
        Name: /etc/systemd/system/node_exporter.service
      Result: True
     Comment: File /etc/systemd/system/node_exporter.service updated
     Started: 10:42:38.773906
    Duration: 302.168 ms
     Changes:
              ----------
              diff:
                  ---
                  +++
                  @@ -7,7 +7,7 @@
                   Type=simple
                   NotifyAccess=all
                   User=prometheus
                  -ExecStart=/opt/node_exporter-1.3.1.linux-amd64/node_exporter \
                  +ExecStart=/usr/sbin/node_exporter \
                   --collector.textfile.directory=/var/lib/node_exporter

                   [Install]
----------
          ID: node-exporter-service
    Function: service.running
        Name: node_exporter
      Result: True
     Comment: Service node_exporter has been enabled, and is in the desired state
     Started: 10:42:39.188802
    Duration: 3908.323 ms
     Changes:
              ----------
              node_exporter:
                  True

Summary for salt-master.mydomain.com
------------
Succeeded: 3 (changed=3)
Failed:    0
------------
Total states run:     3
Total run time:  56.052 s

[root@salt-master ~]$ grep ExecStart /etc/systemd/system/node_exporter.service
ExecStart=/usr/sbin/node_exporter \

[root@salt-master ~]$ systemctl status node_exporter
● node_exporter.service - Daemon for Prometheus node_exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2022-11-07 10:36:23 JST; 6min ago
 Main PID: 1782 (node_exporter)
   CGroup: /system.slice/node_exporter.service
           └─1782 /opt/node_exporter-1.3.1.linux-amd64/node_exporter --collector.textfile.directory=/var/lib/node_exporter

[root@salt-master ~]$

vincent-olivert-riera avatar Nov 07 '22 01:11 vincent-olivert-riera

N.B. 3004.1 has security vulnerabilities. You should upgrade to 3004.2 or 3005.1

I have updated to 3005.1 and it didn't make any difference. The behavior is exactly the same.

vincent-olivert-riera avatar Nov 08 '22 23:11 vincent-olivert-riera

I have debugged this a bit.

In both test mode and normal mode the call to _check_for_unit_changes is successful and systemctl_reload() is performed. So, Salt is successfully detecting there have been changes in the service unit file.

When running in normal mode, the mod_watch function (which would trigger the service restart) is never called. Surprisingly, when running in test mode, mod_watch is called.

I think this is because the _enable function is called in this way... https://github.com/saltstack/salt/blob/v3005.1/salt/states/service.py#L508 ...given that before_toggle_status is True and before_toggle_enable_status is False.

When the _enable function is called in that way, the code flow falls into this block of code: https://github.com/saltstack/salt/blob/v3005.1/salt/states/service.py#L156-L160

I hope this helps a Salt developer to further debug this issue.

vincent-olivert-riera avatar Nov 14 '22 06:11 vincent-olivert-riera

I don't think this is a bug, the salt documentation mentions the following

If the watching state changes key contains values, then mod_watch will not be called. If you're using watch or watch_in then it's a good idea to have a state that only enforces one attribute - such as splitting out service.running into its own state and have service.enabled in another.

Your output indicates that the service.running state enabled the service, thus watch is not called by design.

koallen avatar Mar 11 '24 07:03 koallen

Can confirm, watch gets ignored if we ask for enabled:true and the service is disabled before the state run.

NMi-ru avatar Jul 21 '24 09:07 NMi-ru

This should be fixed in 3006.9, can you verify if that the bug is resolved now?

Akm0d avatar Aug 06 '24 16:08 Akm0d