salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] 3004.1 broke multi-master

Open nicholasmhughes opened this issue 3 years ago • 1 comments

Related to #61868, but opening another issue in the interest of being able to segment problems since syndic might be fixed as of 3004.2

Description Starting with version 3004.1, minions in multi-master setups where the masters have distinct keys but the same signing key constantly throw this error in the logs:

2022-07-11 14:58:18,725 [salt.pillar      :267 ][ERROR   ][10018] Pillar payload signature failed to validate.
2022-07-11 14:58:18,725 [salt.minion      :1143][ERROR   ][10018] Error while bringing up minion for multi-master. Is master at d10-party-01 responding?

Setup

master01
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X -M git master

mkdir /etc/salt/master.d
cat << EOF > /etc/salt/master.d/multi.conf
auto_accept: True
master_sign_pubkey: True
master_sign_key_name: master_pubkey_signature
EOF

cat << EOF > /etc/salt/minion.d/multi.conf
master:
  - master01
  - master02
master_type: str
verify_master_pubkey_sign: True
EOF

systemctl start salt-master

cp /etc/salt/pki/master/master_pubkey_signature.pub /etc/salt/pki/minion/master_sign.pub

systemctl start salt-minion

echo -e "The stuff below is for copy/pasting into the other machines:\n"

echo "master_pubkey_signature.pem"
cat /etc/salt/pki/master/master_pubkey_signature.pem; echo

echo "master_pubkey_signature.pub"
cat /etc/salt/pki/master/master_pubkey_signature.pub; echo
master02
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X -M git master

mkdir /etc/salt/master.d
cat << EOF > /etc/salt/master.d/multi.conf
auto_accept: True
master_sign_pubkey: True
master_sign_key_name: master_pubkey_signature
EOF

cat << EOF > /etc/salt/minion.d/multi.conf
master:
  - master02
  - master01
master_type: str
verify_master_pubkey_sign: True
EOF

cat << EOF > /etc/salt/pki/master/master_pubkey_signature.pem
INSERT master_pubkey_signature.pem FROM ABOVE
EOF

cat << EOF > /etc/salt/pki/master/master_pubkey_signature.pub
INSERT master_pubkey_signature.pub FROM ABOVE
EOF

systemctl start salt-master

cp /etc/salt/pki/master/master_pubkey_signature.pub /etc/salt/pki/minion/master_sign.pub

systemctl start salt-minion
minion01
curl -L https://bootstrap.saltproject.io | sudo sh -s -- -X git master

cat << EOF > /etc/salt/minion.d/multi.conf
master:
  - master01
  - master02
master_type: str
verify_master_pubkey_sign: True
EOF

cat << EOF > /etc/salt/pki/minion/master_sign.pub
INSERT master_pubkey_signature.pub FROM ABOVE
EOF

systemctl start salt-minion

Steps to Reproduce the behavior Just start tailing the minion logs and you should see the aforementioned errors.

Expected behavior I would expect a multi-master setup to use the signing key (pub) for verification of inbound signatures instead of an individual master public key.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3005+0na.e3929c5
 
Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.4
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.15.0
        pygit2: Not Installed
        Python: 3.7.3 (default, Jan 22 2021, 20:04:44)
  python-gnupg: Not Installed
        PyYAML: 6.0
         PyZMQ: 23.2.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: debian 10 buster
        locale: UTF-8
       machine: x86_64
       release: 4.19.0-18-amd64
        system: Linux
       version: Debian GNU/Linux 10 buster

Additional context The versions report provided is for the current pre-3005 master as of issue entry, but can be reproduced on 3004.1 and 3004.2

nicholasmhughes avatar Jul 11 '22 22:07 nicholasmhughes

@dwoz any ideas here?

Ch3LL avatar Jul 14 '22 18:07 Ch3LL

We're also seeing this in a multi-master setup, even when not running a minion service (in this case we're running modules using salt-call).

Sircular avatar Aug 26 '22 17:08 Sircular

I just went through this issue after upgrading 20k+ minions to 3003.5. The core issue came down to how the master.key and master.pub are configured on your masters participating in multi master setups. If you are in an active active multi master configuration you MUST have the same master.key and master.pub keys on every master that is in the minions list regardless of if signature checks are enabled or not. This is in line with what is stated in the multi master tutorial and was not being properly validate until 3003.4 when they added checks to remediate signing attacks.

I believe the exact change that was made is related to signing behavior with the "minion_master.pub" which is cached when a minion connects to a master. This file is being used to validate signatures and is always the "master.pub" file of the first master the minion is currently connected to. So this issue results when the signature used to sign requests coming from another master is not signing with the matching "master.pem" of the first master. Hence the requirement for both masters in an active active multi master deployment to use the same key. I have asked vmware support and salt resources at vmware to make note of this and provide clarification as the release notes from version 3003.4 and 3003.5 or 3004.1 and 3004.2 do not accurately provided insight to the impact of the security fixes that were made.

Reference configuration: https://docs.saltproject.io/en/latest/topics/tutorials/multimaster.html#prepping-a-redundant-master

You can use the signature check options with this configuration but they do not give that example in the tutorial. You can find examples of the signature check configuration in the failover tutorial on saltstacks site. We ran into this issue because we migrated from the active passive (failover) configuration to multi master (active active) setup and everything was working until version 3003.4. After exploring the issue and reproducing it in a lab I found the issue occurs because of the key mismatch. Now minions are in fact connecting to both masters and do respond to module commands. They do however fail to respond to state applies whether executed remotely or locally depending if the initial master they cached the "master.pub" key of is available to them.

Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions. We found stopping the first master after the second had come online with the changes and then starting the first master again was enough to force minions to reestablish their connections and allowed the minions to start behaving without any intervention on the minion side. Worst case scenario we stopped the service on both masters so that minions started to attempt retries and then started them again. Hope this helps others out there. If I find the bug that the code changes were made on Ill link it over when I have a minute.

dev-on-ops avatar Oct 10 '22 13:10 dev-on-ops

Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions.

https://docs.saltproject.io/en/latest/topics/tutorials/multimaster_pki.html explicitly contradicts using the same key pair on all masters in its final sentence, unless I misunderstand what master.pub means in this context.

What is actually the correct procedure for operating multiple active masters, or is there no viable way of doing that at present as a result of this bug?

antiphase avatar Oct 12 '22 22:10 antiphase

Knowing the above we were able to implement the changes to the key by stopping the masters one at a time, ensuring the master.pem, master.pub, master_sign.pem and master_sign.pub matched the partner server in the multi master setup and that the master_sign keypair mateched what was configured on the minions.

https://docs.saltproject.io/en/latest/topics/tutorials/multimaster_pki.html explicitly contradicts using the same key pair on all masters in its final sentence, unless I misunderstand what master.pub means in this context.

What is actually the correct procedure for operating multiple active masters, or is there no viable way of doing that at present as a result of this bug?

Yes for failover multi master which is what you linked. Active active multi master is linked above in my post and explicitly requires the master.pem and pub be identical.

dev-on-ops avatar Oct 12 '22 22:10 dev-on-ops

We already have all the masters using the same keys in their configuration, and the public key for the master matches the public key configured in all the minions. We are still seeing this output.

Sircular avatar Oct 18 '22 17:10 Sircular

We already have all the masters using the same keys in their configuration, and the public key for the master matches the public key configured in all the minions. We are still seeing this output.

Can you post your salt master config so I can review.

dev-on-ops avatar Oct 18 '22 17:10 dev-on-ops

And a minion config example also.

dev-on-ops avatar Oct 18 '22 17:10 dev-on-ops

I talked with @doesitblend to get some more clarity on the action requested for this ticket. The engineering ask is to make it possible to use key signing with different key pairs on the masters. There was a related docs request, which Ken opened and linked as related to this ticket: https://github.com/saltstack/salt/issues/63094

barbaricyawps avatar Nov 21 '22 17:11 barbaricyawps