salt
salt copied to clipboard
[BUG] Master failed to authenticate message from minion, minion does not re-connect after master being offline
Description The minion service does not re-connect to master after master was offline for some time. This is erratic, does not always happen the same way, some times it reconnects without issues.
From the minion, the minion gets results for commands like test.ping, but the same command from the master to the minion does not work.
In the event bus, events from the minion side returns are received. Only after restarting the minion it does reconnect properly
In the master log, there's an error about Failed to authenticate message, but looks like the auth seems to be accepted, but it does not self recover. Keeps getting this same message.
[DEBUG ] salt.crypt.sign_message: Signing message.
[DEBUG ] Failed to authenticate message
[DEBUG ] Minion failed to auth to master. Since the payload is encrypted, it is not known which minion failed to authenticate. It is likely that this is a transient failure due to the master rotating its public key.
[DEBUG ] Failed to authenticate message
[DEBUG ] Minion failed to auth to master. Since the payload is encrypted, it is not known which minion failed to authenticate. It is likely that this is a transient failure due to the master rotating its public key.
[DEBUG ] Failed to authenticate message
[DEBUG ] Minion failed to auth to master. Since the payload is encrypted, it is not known which minion failed to authenticate. It is likely that this is a transient failure due to the master rotating its public key.
[INFO ] Authentication request from vesselsim-win-ems-1
[INFO ] Authentication accepted from vesselsim-win-ems-1
PS C:\Users\adrian> salt-call status.ping_master 172.24.0.4
local:
True
PS C:\Users\adrian> salt-call status.master master=172.24.0.4
local:
True
PS C:\Users\adrian> salt-call status.ping_master 172.24.0.4
local:
True
PS C:\Users\adrian> salt-call test.ping
local:
True
[root@vesselsim ~]# salt vesselsim-win-ems-1 test.ping
vesselsim-win-ems-1:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240416170654648177
ERROR: Minions returned with non-zero exit code
Setup 3006.1 but I've seen this in multiple versions new ones and older ones
Please be as specific as possible and give set-up details.
- [ ] on-prem machine
- [ ] VM (Virtualbox, KVM, etc. please specify)
- [ ] VM running on a cloud service, please be explicit and add details
- [ ] container (Kubernetes, Docker, containerd, etc. please specify)
- [ ] or a combination, please be explicit
- [ ] jails if it is FreeBSD
- [ ] classic packaging
- [ ] onedir packaging
- [ ] used bootstrap to install
Steps to Reproduce the behavior (Include debug logs if possible and relevant)
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)PASTE HERE
Additional context Add any other context about the problem here.
@amalaguti Are you able to test this against 3006.8?
@dwoz It seems a bit better in 3006.8. it feels like it can reconnect better than before
But in the process of testing this I found the following issue #66497
And this one #66375 is still present too