[BUG] 3004.1 broke multi-master
Related to #61868, but opening another issue in the interest of being able to segment problems since syndic might be fixed as of 3004.2
Description Starting with version 3004.1, minions in multi-master setups where the masters have distinct keys but the same signing key constantly throw this error in the logs:
2022-07-11 14:58:18,725 [salt.pillar :267 ][ERROR ][10018] Pillar payload signature failed to validate.
2022-07-11 14:58:18,725 [salt.minion :1143][ERROR ][10018] Error while bringing up minion for multi-master. Is master at d10-party-01 responding?
Setup
master01
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X -M git master
mkdir /etc/salt/master.d
cat << EOF > /etc/salt/master.d/multi.conf
auto_accept: True
master_sign_pubkey: True
master_sign_key_name: master_pubkey_signature
EOF
cat << EOF > /etc/salt/minion.d/multi.conf
master:
- master01
- master02
master_type: str
verify_master_pubkey_sign: True
EOF
systemctl start salt-master
cp /etc/salt/pki/master/master_pubkey_signature.pub /etc/salt/pki/minion/master_sign.pub
systemctl start salt-minion
echo -e "The stuff below is for copy/pasting into the other machines:\n"
echo "master_pubkey_signature.pem"
cat /etc/salt/pki/master/master_pubkey_signature.pem; echo
echo "master_pubkey_signature.pub"
cat /etc/salt/pki/master/master_pubkey_signature.pub; echo
master02
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X -M git master
mkdir /etc/salt/master.d
cat << EOF > /etc/salt/master.d/multi.conf
auto_accept: True
master_sign_pubkey: True
master_sign_key_name: master_pubkey_signature
EOF
cat << EOF > /etc/salt/minion.d/multi.conf
master:
- master02
- master01
master_type: str
verify_master_pubkey_sign: True
EOF
cat << EOF > /etc/salt/pki/master/master_pubkey_signature.pem
INSERT master_pubkey_signature.pem FROM ABOVE
EOF
cat << EOF > /etc/salt/pki/master/master_pubkey_signature.pub
INSERT master_pubkey_signature.pub FROM ABOVE
EOF
systemctl start salt-master
cp /etc/salt/pki/master/master_pubkey_signature.pub /etc/salt/pki/minion/master_sign.pub
systemctl start salt-minion
minion01
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X git master
cat << EOF > /etc/salt/minion.d/multi.conf
master:
- master01
- master02
master_type: str
verify_master_pubkey_sign: True
EOF
cat << EOF > /etc/salt/pki/minion/master_sign.pub
INSERT master_pubkey_signature.pub FROM ABOVE
EOF
systemctl start salt-minion
Steps to Reproduce the behavior Just start tailing the minion logs and you should see the aforementioned errors.
Expected behavior I would expect a multi-master setup to use the signing key (pub) for verification of inbound signatures instead of an individual master public key.
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)Salt Version:
Salt: 3005+0na.e3929c5
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.4
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: 3.15.0
pygit2: Not Installed
Python: 3.7.3 (default, Jan 22 2021, 20:04:44)
python-gnupg: Not Installed
PyYAML: 6.0
PyZMQ: 23.2.0
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: debian 10 buster
locale: UTF-8
machine: x86_64
release: 4.19.0-18-amd64
system: Linux
version: Debian GNU/Linux 10 buster
Additional context The versions report provided is for the current pre-3005 master as of issue entry, but can be reproduced on 3004.1 and 3004.2
@dwoz any ideas here?
We're also seeing this in a multi-master setup, even when not running a minion service (in this case we're running modules using salt-call).
I just went through this issue after upgrading 20k+ minions to 3003.5. The core issue came down to how the master.key and master.pub are configured on your masters participating in multi master setups. If you are in an active active multi master configuration you MUST have the same master.key and master.pub keys on every master that is in the minions list regardless of if signature checks are enabled or not. This is in line with what is stated in the multi master tutorial and was not being properly validate until 3003.4 when they added checks to remediate signing attacks.
I believe the exact change that was made is related to signing behavior with the "minion_master.pub" which is cached when a minion connects to a master. This file is being used to validate signatures and is always the "master.pub" file of the first master the minion is currently connected to. So this issue results when the signature used to sign requests coming from another master is not signing with the matching "master.pem" of the first master. Hence the requirement for both masters in an active active multi master deployment to use the same key. I have asked vmware support and salt resources at vmware to make note of this and provide clarification as the release notes from version 3003.4 and 3003.5 or 3004.1 and 3004.2 do not accurately provided insight to the impact of the security fixes that were made.
Reference configuration: https://docs.saltproject.io/en/latest/topics/tutorials/multimaster.html#prepping-a-redundant-master
You can use the signature check options with this configuration but they do not give that example in the tutorial. You can find examples of the signature check configuration in the failover tutorial on saltstacks site. We ran into this issue because we migrated from the active passive (failover) configuration to multi master (active active) setup and everything was working until version 3003.4. After exploring the issue and reproducing it in a lab I found the issue occurs because of the key mismatch. Now minions are in fact connecting to both masters and do respond to module commands. They do however fail to respond to state applies whether executed remotely or locally depending if the initial master they cached the "master.pub" key of is available to them.
Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions. We found stopping the first master after the second had come online with the changes and then starting the first master again was enough to force minions to reestablish their connections and allowed the minions to start behaving without any intervention on the minion side. Worst case scenario we stopped the service on both masters so that minions started to attempt retries and then started them again. Hope this helps others out there. If I find the bug that the code changes were made on Ill link it over when I have a minute.
Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions.
https://docs.saltproject.io/en/latest/topics/tutorials/multimaster_pki.html explicitly contradicts using the same key pair on all masters in its final sentence, unless I misunderstand what master.pub means in this context.
What is actually the correct procedure for operating multiple active masters, or is there no viable way of doing that at present as a result of this bug?
Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions.
https://docs.saltproject.io/en/latest/topics/tutorials/multimaster_pki.html explicitly contradicts using the same key pair on all masters in its final sentence, unless I misunderstand what master.pub means in this context.
What is actually the correct procedure for operating multiple active masters, or is there no viable way of doing that at present as a result of this bug?
Yes for failover multi master which is what you linked. Active active multi master is linked above in my post and explicitly requires the master.pem and pub be identical.
We already have all the masters using the same keys in their configuration, and the public key for the master matches the public key configured in all the minions. We are still seeing this output.
We already have all the masters using the same keys in their configuration, and the public key for the master matches the public key configured in all the minions. We are still seeing this output.
Can you post your salt master config so I can review.
And a minion config example also.
I talked with @doesitblend to get some more clarity on the action requested for this ticket. The engineering ask is to make it possible to use key signing with different key pairs on the masters. There was a related docs request, which Ken opened and linked as related to this ticket: https://github.com/saltstack/salt/issues/63094