[BUG] Windows 3006.10 minions time out with an error in minion.py and no longer communicate with the master 3006.10 (multimaster) after upgrade from 3005-1.2 with multiprocessing: True in minion config
Description 3006.10 minions time out with an error in minion.py and no longer communicate with the master 3006.10.
Windows Server 2019 Windows Server 2016 Standard Windows Server 2022
Please be as specific as possible and give set-up details.
- on-prem machine - vmware VM
- cloud machine, AWS
Steps to Reproduce the behavior upgrade to salt from 3005.1-2 to 3006.10
2025-04-13 13:02:06,101 [salt.utils.process:1004][ERROR ][2880] An un-handled exception from the multiprocessing process 'ProcessPayload(jid=20250413170104743058)' was caught:
Traceback (most recent call last):
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\process.py", line 999, in wrapped_run_func
return run_func()
File "C:\Program Files\Salt Project\Salt\Lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1927, in _target
run_func(minion_instance, opts, data)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1921, in run_func
return Minion._thread_return(minion_instance, opts, data)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 2157, in _thread_return
minion_instance._return_pub(ret)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 2385, in _return_pub
ret_val = self._send_req_sync(load, timeout=timeout)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1650, in _send_req_sync
raise TimeoutError("Request timed out")
TimeoutError: Request timed out
Expected behavior No timeouts should occur with default minion config of multiprocessing: True. please refer behavior in 3005.1-2, there never used to be this problem.
Screenshots
Minion Versions Report Windows server 2019
PS C:\Users\vchoudhury_srv> salt-call --versions
Salt Version:
Salt: 3006.10
Python Version:
Python: 3.10.16 (heads/main:c504d17, Mar 6 2025, 02:25:38) [MSC v.1943 64 bit (AMD64)]
Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
cryptography: 42.0.5
dateutil: 2.8.1
docker-py: Not Installed
gitdb: 4.0.7
gitpython: Not Installed
Jinja2: 3.1.6
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 25.0.2
relenv: 0.18.1
smmap: 4.0.0
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist:
locale: utf-8
machine: AMD64
release: 2016Server
system: Windows
version: 2016Server 10.0.14393 SP0 Multiprocessor Free
Windows Server 2016
PS C:\Windows\system32> salt-call --versions
Salt Version:
Salt: 3006.10
Python Version:
Python: 3.10.16 (heads/main:c504d17, Mar 6 2025, 02:25:38) [MSC v.1943 64 bit (AMD64)]
Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
cryptography: 42.0.5
dateutil: 2.8.1
docker-py: Not Installed
gitdb: 4.0.7
gitpython: Not Installed
Jinja2: 3.1.6
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 25.0.2
relenv: 0.18.1
smmap: 4.0.0
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist:
locale: utf-8
machine: AMD64
release: 2019Server
system: Windows
version: 2019Server 10.0.17763 SP0 Multiprocessor Free
master versions
[root@rlx8gdcpsamp1v ~]# salt --versions
Salt Version:
Salt: 3006.10
Python Version:
Python: 3.10.16 (main, Mar 6 2025, 02:23:15) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: unknown
cryptography: 42.0.5
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.6
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.18.1
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
Salt Extensions:
SSEAPE: 8.17.0.6
System Versions:
dist: rhel 8.10 Ootpa
locale: utf-8
machine: x86_64
release: 4.18.0-553.42.1.el8_10.x86_64
system: Linux
version: Red Hat Enterprise Linux 8.10 Ootpa
Additional context Add any other context about the problem here.
@geisingerDev are you able to provide trace level logs for salt.minion?
in the minion config:
log_granular_levels:
'salt.minion': 'trace'
all minions currently have multiprocessing: False. minion config is managed by the master. I will have to manually update multiprocessing: True (and wait for the issue to reappear on some test servers). ETA 1 week
I'm observing the same behavior on Linux (Debian 11/12) in a multimaster setup.
2025-04-23 10:36:52,238 [salt.utils.process:1004][ERROR ][1846781] An un-handled exception from the multiprocessing process 'ProcessPayload(jid=20250423083552095168)' was caught:
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 999, in wrapped_run_func
return run_func()
File "/opt/saltstack/salt/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1927, in _target
run_func(minion_instance, opts, data)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1921, in run_func
return Minion._thread_return(minion_instance, opts, data)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 2157, in _thread_return
minion_instance._return_pub(ret)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 2385, in _return_pub
ret_val = self._send_req_sync(load, timeout=timeout)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", line 1650, in _send_req_sync
raise TimeoutError("Request timed out")
TimeoutError: Request timed out
After restarting the minion the connection is working again, after some time it just happens again. The masters are online and reachable from the minion. I did an upgrade from 3002.6 to 3006.9 first where this happend more often. After upgrading to 3006.10 it got better but it still happens. Salt Master:
sudo salt-call --versions
Salt Version:
Salt: 3006.10
Python Version:
Python: 3.10.16 (main, Mar 6 2025, 02:23:15) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
cryptography: 42.0.5
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.6
libgit2: 1.3.0
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: 1.7.0
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.18.1
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: debian 11 bullseye
locale: utf-8
machine: x86_64
release: 5.10.0-34-amd64
system: Linux
version: Debian GNU/Linux 11 bullseye
Salt Minion:
sudo salt-call --versions
Salt Version:
Salt: 3006.10
Python Version:
Python: 3.10.16 (main, Mar 6 2025, 02:23:15) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
cryptography: 42.0.5
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.6
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.18.1
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: debian 11 bullseye
locale: utf-8
machine: x86_64
release: 5.10.0-30-amd64
system: Linux
version: Debian GNU/Linux 11 bullseye
Is this related to https://github.com/saltstack/salt/issues/65265? This appears to be the same error, with the exception of the multimaster.
Can this pleas be tested against 3006.11?
I'm in 3006.11 and I'm still seeing this issue -
2025-07-28 14:05:12,031 [salt.minion :1797][INFO ][2411455] User root Executing command test.ping with jid 20250728120512024080
2025-07-28 14:05:12,093 [salt.minion :2004][INFO ][2471962] Starting a new job 20250728120512024080 with PID 2471962
2025-07-28 14:05:12,097 [salt.minion :2295][INFO ][2471962] Returning information for job: 20250728120512024080
2025-07-28 14:05:17,331 [salt.minion :1797][INFO ][2411455] User root Executing command saltutil.find_job with jid 20250728120517325173
2025-07-28 14:05:17,393 [salt.minion :2004][INFO ][2471969] Starting a new job 20250728120517325173 with PID 2471969
2025-07-28 14:05:17,399 [salt.minion :2295][INFO ][2471969] Returning information for job: 20250728120517325173
2025-07-28 14:05:41,424 [salt.minion :2839][ERROR ][2411455] Timeout encountered while sending {'cmd': '_return', 'id': 'minion2-2', 'success': True, 'return': {}, 'retcode': 0, 'jid': '20250728120517325173', 'fun': 'saltutil.find_job', 'fun_args': ['20250728120512024080'], 'user': 'root', 'master_id': 'salt-master-1', '_stamp': '2025-07-28T12:05:17.417912', 'nonce': 'acebf887076c4d96881dffb9debac448'} request. id=937ba390-fbc1-4dc8-b17f-aa90f3e93f3e
2025-07-28 14:05:42,105 [salt.minion :2839][ERROR ][2411455] Timeout encountered while sending {'cmd': '_return', 'id': 'minion2-2', 'success': True, 'return': True, 'retcode': 0, 'jid': '20250728120512024080', 'fun': 'test.ping', 'fun_args': [], 'user': 'root', 'master_id': 'salt-master-1', '_stamp': '2025-07-28T12:05:12.100025', 'nonce': '2f77a1b911a3453386d7fb0bf00061a8'} request. id=eec71486-8a60-40d8-aef8-9a7274245fb0
salt-call --versions
Salt Version:
Salt: 3006.11
Python Version:
Python: 3.10.17 (main, May 11 2025, 04:07:13) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: 18.6.1
cryptography: 42.0.5
dateutil: 2.8.1
docker-py: Not Installed
gitdb: 4.0.12
gitpython: Not Installed
Jinja2: 3.1.6
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.19.2
smmap: 5.0.2
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: debian 11 bullseye
locale: utf-8
machine: x86_64
release: 5.10.0-33-cloud-amd64
system: Linux
version: Debian GNU/Linux 11 bullseye
A simple test.ping being invoked from master is failing. It is happening randomly to any host. Service restart of minion fixes the issue, only for it to occur again later.
Is this issue resolved in newer versions?