salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] Windows 3006.10 minions time out with an error in minion.py and no longer communicate with the master 3006.10 (multimaster) after upgrade from 3005-1.2 with multiprocessing: True in minion config

Open geisingerDev opened this issue 8 months ago • 7 comments

Description 3006.10 minions time out with an error in minion.py and no longer communicate with the master 3006.10.

Windows Server 2019 Windows Server 2016 Standard Windows Server 2022

Please be as specific as possible and give set-up details.

  • on-prem machine - vmware VM
  • cloud machine, AWS

Steps to Reproduce the behavior upgrade to salt from 3005.1-2 to 3006.10

2025-04-13 13:02:06,101 [salt.utils.process:1004][ERROR   ][2880] An un-handled exception from the multiprocessing process 'ProcessPayload(jid=20250413170104743058)' was caught:
Traceback (most recent call last):
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\process.py", line 999, in wrapped_run_func
    return run_func()
  File "C:\Program Files\Salt Project\Salt\Lib\multiprocessing\process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1927, in _target
    run_func(minion_instance, opts, data)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1921, in run_func
    return Minion._thread_return(minion_instance, opts, data)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 2157, in _thread_return
    minion_instance._return_pub(ret)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 2385, in _return_pub
    ret_val = self._send_req_sync(load, timeout=timeout)
  File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1650, in _send_req_sync
    raise TimeoutError("Request timed out")
TimeoutError: Request timed out

Expected behavior No timeouts should occur with default minion config of multiprocessing: True. please refer behavior in 3005.1-2, there never used to be this problem.

Screenshots

Image

Minion Versions Report Windows server 2019

PS C:\Users\vchoudhury_srv> salt-call --versions
Salt Version:
          Salt: 3006.10

Python Version:
        Python: 3.10.16 (heads/main:c504d17, Mar  6 2025, 02:25:38) [MSC v.1943 64 bit (AMD64)]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: 4.0.7
     gitpython: Not Installed
        Jinja2: 3.1.6
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 25.0.2
        relenv: 0.18.1
         smmap: 4.0.0
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist:
        locale: utf-8
       machine: AMD64
       release: 2016Server
        system: Windows
       version: 2016Server 10.0.14393 SP0 Multiprocessor Free

Windows Server 2016

PS C:\Windows\system32> salt-call --versions
Salt Version:
          Salt: 3006.10

Python Version:
        Python: 3.10.16 (heads/main:c504d17, Mar  6 2025, 02:25:38) [MSC v.1943 64 bit (AMD64)]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: 4.0.7
     gitpython: Not Installed
        Jinja2: 3.1.6
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 25.0.2
        relenv: 0.18.1
         smmap: 4.0.0
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist:
        locale: utf-8
       machine: AMD64
       release: 2019Server
        system: Windows
       version: 2019Server 10.0.17763 SP0 Multiprocessor Free

master versions

[root@rlx8gdcpsamp1v ~]# salt --versions
Salt Version:
          Salt: 3006.10

Python Version:
        Python: 3.10.16 (main, Mar  6 2025, 02:23:15) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.6
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.18.1
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

Salt Extensions:
        SSEAPE: 8.17.0.6

System Versions:
          dist: rhel 8.10 Ootpa
        locale: utf-8
       machine: x86_64
       release: 4.18.0-553.42.1.el8_10.x86_64
        system: Linux
       version: Red Hat Enterprise Linux 8.10 Ootpa

Additional context Add any other context about the problem here.

geisingerDev avatar Apr 14 '25 12:04 geisingerDev

@geisingerDev are you able to provide trace level logs for salt.minion?

in the minion config:

log_granular_levels:
  'salt.minion': 'trace'

dwoz avatar Apr 17 '25 10:04 dwoz

all minions currently have multiprocessing: False. minion config is managed by the master. I will have to manually update multiprocessing: True (and wait for the issue to reappear on some test servers). ETA 1 week

geisingerDev avatar Apr 17 '25 17:04 geisingerDev

I'm observing the same behavior on Linux (Debian 11/12) in a multimaster setup.

2025-04-23 10:36:52,238 [salt.utils.process:1004][ERROR   ][1846781] An un-handled exception from the multiprocessing process 'ProcessPayload(jid=20250423083552095168)' was caught:
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 999, in wrapped_run_func
    return run_func()
  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1927, in _target
    run_func(minion_instance, opts, data)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1921, in run_func
    return Minion._thread_return(minion_instance, opts, data)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 2157, in _thread_return
    minion_instance._return_pub(ret)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 2385, in _return_pub
    ret_val = self._send_req_sync(load, timeout=timeout)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1650, in _send_req_sync
    raise TimeoutError("Request timed out")
TimeoutError: Request timed out

After restarting the minion the connection is working again, after some time it just happens again. The masters are online and reachable from the minion. I did an upgrade from 3002.6 to 3006.9 first where this happend more often. After upgrading to 3006.10 it got better but it still happens. Salt Master:

sudo salt-call --versions
Salt Version:
          Salt: 3006.10
 
Python Version:
        Python: 3.10.16 (main, Mar  6 2025, 02:23:15) [GCC 11.2.0]
 
Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.6
       libgit2: 1.3.0
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: 1.7.0
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.18.1
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: debian 11 bullseye
        locale: utf-8
       machine: x86_64
       release: 5.10.0-34-amd64
        system: Linux
       version: Debian GNU/Linux 11 bullseye

Salt Minion:

sudo salt-call --versions
Salt Version:
          Salt: 3006.10
 
Python Version:
        Python: 3.10.16 (main, Mar  6 2025, 02:23:15) [GCC 11.2.0]
 
Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.6
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.18.1
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: debian 11 bullseye
        locale: utf-8
       machine: x86_64
       release: 5.10.0-30-amd64
        system: Linux
       version: Debian GNU/Linux 11 bullseye

nachevn avatar Apr 23 '25 08:04 nachevn

Is this related to https://github.com/saltstack/salt/issues/65265? This appears to be the same error, with the exception of the multimaster.

bencarleyfs avatar May 09 '25 09:05 bencarleyfs

Can this pleas be tested against 3006.11?

dwoz avatar Jun 06 '25 04:06 dwoz

I'm in 3006.11 and I'm still seeing this issue -

2025-07-28 14:05:12,031 [salt.minion      :1797][INFO    ][2411455] User root Executing command test.ping with jid 20250728120512024080
2025-07-28 14:05:12,093 [salt.minion      :2004][INFO    ][2471962] Starting a new job 20250728120512024080 with PID 2471962
2025-07-28 14:05:12,097 [salt.minion      :2295][INFO    ][2471962] Returning information for job: 20250728120512024080
2025-07-28 14:05:17,331 [salt.minion      :1797][INFO    ][2411455] User root Executing command saltutil.find_job with jid 20250728120517325173
2025-07-28 14:05:17,393 [salt.minion      :2004][INFO    ][2471969] Starting a new job 20250728120517325173 with PID 2471969
2025-07-28 14:05:17,399 [salt.minion      :2295][INFO    ][2471969] Returning information for job: 20250728120517325173
2025-07-28 14:05:41,424 [salt.minion      :2839][ERROR   ][2411455] Timeout encountered while sending {'cmd': '_return', 'id': 'minion2-2', 'success': True, 'return': {}, 'retcode': 0, 'jid': '20250728120517325173', 'fun': 'saltutil.find_job', 'fun_args': ['20250728120512024080'], 'user': 'root', 'master_id': 'salt-master-1', '_stamp': '2025-07-28T12:05:17.417912', 'nonce': 'acebf887076c4d96881dffb9debac448'} request. id=937ba390-fbc1-4dc8-b17f-aa90f3e93f3e
2025-07-28 14:05:42,105 [salt.minion      :2839][ERROR   ][2411455] Timeout encountered while sending {'cmd': '_return', 'id': 'minion2-2', 'success': True, 'return': True, 'retcode': 0, 'jid': '20250728120512024080', 'fun': 'test.ping', 'fun_args': [], 'user': 'root', 'master_id': 'salt-master-1', '_stamp': '2025-07-28T12:05:12.100025', 'nonce': '2f77a1b911a3453386d7fb0bf00061a8'} request. id=eec71486-8a60-40d8-aef8-9a7274245fb0
salt-call --versions
Salt Version:
          Salt: 3006.11

Python Version:
        Python: 3.10.17 (main, May 11 2025, 04:07:13) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: 4.0.12
     gitpython: Not Installed
        Jinja2: 3.1.6
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.19.2
         smmap: 5.0.2
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: debian 11 bullseye
        locale: utf-8
       machine: x86_64
       release: 5.10.0-33-cloud-amd64
        system: Linux
       version: Debian GNU/Linux 11 bullseye

A simple test.ping being invoked from master is failing. It is happening randomly to any host. Service restart of minion fixes the issue, only for it to occur again later.

jay1648 avatar Jul 28 '25 12:07 jay1648

Is this issue resolved in newer versions?

eata7 avatar Dec 05 '25 12:12 eata7