salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] Master gets disconnected every few seconds when connecting to the minion.

Open Urmila4718 opened this issue 1 year ago • 1 comments

Description ### I have two master VMs and one minion. While debugging the logs on the minion server, I notice that the master is getting disconnected multiple times. Also this disconnection is causing the scheduled jobs to fail. and when i checked master status it showing some below issue.

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 348, in connect_pub
    self.subscriber = salt.transport.ipc_publish_client(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 210, in ipc_publish_client
    return publish_client(opts, io_loop, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 152, in publish_client
    return salt.transport.tcp.PublishClient(
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/tcp.py", line 220, in __init__
    super().__init__(opts, io_loop, **kwargs)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 398, in __init__
    super().__init__()

salt-minion -l debug
[DEBUG] LazyLoaded state.apply

[DEBUG] Minion of 'salt-masterdev2.tierpoint.com' is handling event tag '__master_connected'

[DEBUG] Minion of 'salt-masterdev2.tierpoint.com' is handling event tag '__master_req_channel_payload/salt-masterdev2.tierpoint.com'

[DEBUG] Minion return retry timer set to 8 seconds (randomized)

[DEBUG] Minion of 'salt-masterdev2.tierpoint.com' is handling event tag '__master_req_channel_payload/salt-masterdev2.tierpoint.com'

[DEBUG] Minion return retry timer set to 6 seconds (randomized)

[DEBUG] Minion of 'salt-masterdev2.tierpoint.com' is handling event tag '/salt/minion/minion_schedule_delete_complete'

[DEBUG] Minion of 'salt-masterdev2.tierpoint.com' is handling event tag '/salt/minion/minion_schedule_delete_complete'

[DEBUG] The functions from module 'mine' are being loaded by dir() on the loaded module

[DEBUG] LazyLoaded mine.update

[DEBUG] schedule.handle_func: adding this job to the jobcache with data {'id': 'dsc01-salt02.lab.tierpoint.com', 'fun': 'mine.update', 'fun_args': [], 'schedule': '__mine_interval', 'jid': '20240827102032661799', 'pid': 2024}

[DEBUG] The functions from module 'config' are being loaded by dir() on the loaded module

[DEBUG] LazyLoaded config.merge

[DEBUG] schedule.handle_func: Removing C:\ProgramData\Salt Project\Salt\var\cache\salt\minion\proc\20240827102032661799

[DEBUG] Subprocess Schedule(name=__mine_interval, jid=20240827102032661799) cleaned up

[DEBUG] schedule: Job __master_alive_salt-masterdev1.tierpoint.com was scheduled with jid_include, adding to cache (jid_include defaults to True)

[DEBUG] schedule: Job __master_alive_salt-masterdev1.tierpoint.com was scheduled with a max number of 1

[INFO] Running scheduled job: __master_alive_salt-masterdev1.tierpoint.com with jid 20240827102132167770

[DEBUG] Subprocess Schedule(name=__master_alive_salt-masterdev1.tierpoint.com, jid=20240827102132167770) added

[DEBUG] schedule: Job __master_alive_salt-masterdev2.tierpoint.com was scheduled with jid_include, adding to cache (jid_include defaults to True)

[DEBUG] schedule: Job __master_alive_salt-masterdev2.tierpoint.com was scheduled with a max number of 1

[INFO] Running scheduled job: __master_alive_salt-masterdev2.tierpoint.com with jid 20240827102132449051

[DEBUG] Subprocess Schedule(name=__master_alive_salt-masterdev2.tierpoint.com, jid=20240827102132449051) added

[DEBUG] schedule: Job __master_failback was scheduled with jid_include, adding to cache (jid_include defaults to True)

[DEBUG] schedule: Job __master_failback was scheduled with a max number of 1

[INFO] Running scheduled job: __master_failback with jid 20240827102132792819

[DEBUG] Subprocess Schedule(name=__master_failback, jid=20240827102132792819) added

[DEBUG] The functions from module 'statuspage' are being loaded by dir() on the loaded module

[DEBUG] The functions from module 'statuspage' are being loaded by dir() on the loaded module

[DEBUG] The functions from module 'status' are being loaded by dir() on the loaded module

[DEBUG] LazyLoaded status.master

[DEBUG] schedule.handle_func: adding this job to the jobcache with data {'id': 'dsc01-salt02.lab.tierpoint.com', 'fun': 'status.master', 'fun_args': [{'connected': True, 'master': 'salt-masterdev1.tierpoint.com'}], 'schedule': '__master_alive_salt-masterdev1.tierpoint.com', 'jid': '20240827102132167770', 'pid': 3984}

[DEBUG] The functions from module 'config' are being loaded by dir() on the loaded module

[DEBUG] LazyLoaded config.get

[DEBUG] Using selector: SelectSelector

[DEBUG] Popen(['git', 'version'], cwd=C:\Users\Administrator, stdin=None, shell=False, universal_newlines=False)

[DEBUG] Using selector: SelectSelector

[DEBUG] Publisher connecting to 127.0.0.1:4511

[DEBUG] The functions from module 'status' are being loaded by dir() on the loaded module

[DEBUG] LazyLoaded status.master

[DEBUG] schedule.handle_func: adding this job to the jobcache with data {'id': 'dsc01-salt02.lab.tierpoint.com', 'fun': 'status.master', 'fun_args': [{'master': 'salt-masterdev2.tierpoint.com', 'connected': True}], 'schedule': '__master_alive_salt-masterdev2.tierpoint.com', 'jid': '20240827102132449051', 'pid': 6436}

[DEBUG] The functions from module 'config' are being loaded by dir() on the loaded module

[DEBUG] LazyLoaded config.get

[DEBUG] Closing _TCPPubServerPublisher instance

[DEBUG] Minion of 'salt-masterdev2.tierpoint.com' is handling event tag '__master_disconnected'

[DEBUG] Using selector: SelectSelector

[DEBUG] The functions from module 'statuspage' are being loaded by dir() on the loaded module

[DEBUG] schedule.handle_func: Removing C:\ProgramData\Salt Project\Salt\var\cache\salt\minion\proc\20240827102132167770

[DEBUG] Popen(['git', 'version'], cwd=C:\Users\Administrator, stdin=None, shell=False, universal_newlines=False)

[DEBUG] Using selector: SelectSelector

[DEBUG] Publisher connecting to 127.0.0.1:4511

[DEBUG] The functions from module 'status' are being loaded by dir() on the loaded module

[DEBUG] Closing _TCPPubServerPublisher instance

[DEBUG] Minion of 'salt-masterdev2.tierpoint.com' is handling event tag '__master_disconnected'

[INFO] Connection to master salt-masterdev2.tierpoint.com lost

[DEBUG] Using selector: SelectSelector

[DEBUG] Using selector: SelectSelector


Setup Master configuration (/etc/salt/master)

interface: 10.166.145.32
file_roots:
  base:
    - /srv/salt/base
  dev:
    - /srv/salt/dev

pillar_roots:  
  base:
    - /srv/pillar

` Minion configuration

master:
    - salt-masterdev1.tierpoint.com
    - salt-masterdev2.tierpoint.com
file_client: remote
master_finger: 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
verify_master_pubkey_sign: True 
always_verify_signature: True
master_type: failover
random_master: True
master_alive_interval: 60
retry_dns_count: 3
retry_dns: 0
master_tries: -1
master_failback: True
autosign_grains:
  - uuid

Please be as specific as possible and give set-up details.

  • [ ] on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • [ ] VM running on a cloud service, please be explicit and add details
  • [ ] container (Kubernetes, Docker, containerd, etc. please specify)
  • [ ] or a combination, please be explicit
  • [ ] jails if it is FreeBSD
  • [ ] classic packaging
  • onedir packaging
  • used bootstrap to install

Steps to Reproduce the behavior Basic test setup, restarting service, or restarting the minion machine.

Expected behavior If I run salt-minion -l debug it should see master connected and scheduled jobs should run from master. Screenshots If applicable, add screenshots to help explain your problem.

Versions Report Master :


Salt Version:
Salt: 3007.1

Python Version:
Python: 3.10.14 (main, Apr 3 2024, 21:30:09) [GCC 11.2.0]

Dependency Versions:
cffi: 1.16.0
cherrypy: unknown
dateutil: 2.8.2
docker-py: Not Installed
gitdb: 4.0.11
gitpython: 3.1.43
Jinja2: 3.1.4
libgit2: 1.7.1
looseversion: 1.3.0
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.7
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 23.1
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: 1.13.1
python-gnupg: 0.5.2
PyYAML: 6.0.1
PyZMQ: 25.1.2
releenv: 0.16.0
smmap: 5.0.1
timelib: 0.3.0
Tornado: 6.3.3
ZMQ: 4.3.4

Salt Package Information:
Package Type: onedir

System Versions:
dist: ubuntu 22.04.4 jammy
locale: utf-8
machine: x86_64
release: 5.15.0-117-generic
system: Linux
version: Ubuntu 22.04.4 jammy

Minion ::

Salt Version:
Salt: 3007.1

Python Version:
Python: 3.10.14 (heads/main
, Apr 3 2024, 21:36:37) [MSC v.1938 64 bit (AMD64)]

Dependency Versions:

cffi: 1.16.0
cherrypy: 18.8.0
dateutil: 2.8.2
docker-py: Not Installed
gitdb: 4.0.10
gitpython: Not Installed
Jinja2: 3.1.4
libgit2: Not Installed
looseversion: 1.3.0
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.7
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 23.1
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.19.1
pygit2: Not Installed
python-gnupg: 0.5.2
PyYAML: 6.0.1
PyZMQ: 25.1.2
relenv: 0.16.0
smmap: 5.0.1
timelib: 0.3.0
Tornado: 6.3.3
ZMQ: 4.3.4
Salt Package Information:
Package Type: onedir

System Versions:

dist:
locale: utf-8
machine: AMD64
release: 2022Server
system: Windows
version: 2022Server 10.0.20348 SP0 Multiprocessor Free

Additional context

Urmila4718 avatar Aug 27 '24 10:08 Urmila4718

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

welcome[bot] avatar Aug 27 '24 10:08 welcome[bot]

@Urmila4718 Can you provide us with the complete traceback?

dwoz avatar Sep 07 '24 23:09 dwoz

Hi @dwoz , We tested three minions connected to a single master, and it worked fine. The problem arises with the multi-master setup with scheduled tasks testing. Tested with both versions 3006.9 and 3007.1 (kept both master and minion version same while testing), Tried 3006.9 on master and 3007.1 on minion after someone suggested that it's working for them in issue, still getting the same issue. https://github.com/saltstack/salt/issues/65265 salt-master : 3006.9/3007.1 salt-minion: 3006.9/3007.1

Urmila4718 avatar Sep 10 '24 07:09 Urmila4718