salt
salt copied to clipboard
non-root users cannot use salt after upgrade to 3006.3
Description After upgrading from 3005 (py3) to 3006 (onedir), non-root users can no longer use the salt command:
$ salt \* test.ping
[WARNING ] Failed to open log file, do you have permission to write to /var/log/salt/master?
[ERROR ] Unable to connect to the salt master publisher at /var/run/salt/master
The salt master could not be contacted. Is master running?
Setup (Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)
Please be as specific as possible and give set-up details.
- [x] on-prem machine
- [x] onedir packaging
Salt-Master running 3006.3 onedir.
Several non-root users have limited permissions on Salt using publisher_acl.
Steps to Reproduce the behavior
run salt command on the master as a non-root user, to receive the above-mentioned error message.
Indeed, the directory /var/run/salt/master has no permissions for other users:
drwxr-x--- 2 root root 120 Sep 28 11:27 .
drwxr-xr-x 4 root root 80 Sep 6 19:32 ..
srw-rw---- 1 root root 0 Sep 28 11:27 master_event_pub.ipc
srw------- 1 root root 0 Sep 28 11:27 master_event_pull.ipc
srw------- 1 root root 0 Sep 28 11:27 publish_pull.ipc
srw------- 1 root root 0 Sep 28 11:27 workers.ipc
After allowing all users read/write access to the directory and all files (which stronly feels wrong), the non-root user gets this error:
$ salt \* test.ping
[WARNING ] Failed to open log file, do you have permission to write to /var/log/salt/master?
Authentication error occurred.
Expected behavior
Running salt commands as non-root user with permissions in publisher_acl should run like before the upgrade.
Versions Report
salt --versions-report
Salt Version:
Salt: 3006.3
Python Version:
Python: 3.10.13 (main, Sep 6 2023, 02:11:27) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: unknown
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: 1.3.0
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: 1.7.0
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.13.10
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: centos 7.9.2009 Core
locale: utf-8
machine: x86_64
release: 3.10.0-1160.90.1.el7.x86_64
system: Linux
version: CentOS Linux 7.9.2009 Core
3006 runs the master as salt by default, not as root. Various manual adjustments may be needed during the upgrade, as noted in the release notes.
@OrangeDog thanks, I saw that mentioned here and there... but upgrading two salt masters here, it was still running as root. I assumed it just kept it as-is during upgrade from a previous version, so I did not touch it for now.
There's not much in the 3006 release notes: https://docs.saltproject.io/en/latest/topics/releases/3006.html And I followed the link to upgrading to onedir: https://docs.saltproject.io/salt/install-guide/en/latest/topics/upgrade-to-onedir.html
I tried to find information for migrating the user and found none.
As I found yet another breaking change in 3006 today (API has to be explicitly enabled now in master config)... Is there a list of all manual adjustments needed when upgrading from 3005 to 3006? This way I could work through them in one block.
That's the wrong link and I'm not sure why it works. These are the actual release notes: https://docs.saltproject.io/en/latest/topics/releases/3006.0.html
Thanks, I will work through that...
But I instantly noticed the sentence The packages will add the user: salt config option to the Salt Master config., which did not happen here on two different salt masters. But (for now) I am ok with keeping it running as root.
Ok, due to https://github.com/saltstack/salt/issues/64275, I now have to run Salt Master as root (and as the salt master VM is dedicated to only be the salt master, I am reasonably fine with that for now).
I got it working:
A small script /usr/local/sbin/fix-salt-permissions.sh to open the permissions:
#!/bin/bash
# wait for salt-master to start and reset permissions
sleep 10
## open up for non-root salt CLI users
# From Salt docs
chmod 755 /var/cache/salt /var/cache/salt/master /var/cache/salt/master/jobs /var/run/salt /var/run/salt/master
# Sockets must also be +rw for all users:
chmod a+rw /var/run/salt/master/*
Override salt-master.service: systemctl edit salt-master.service:
[Service]
ExecStartPost=/usr/local/sbin/fix-salt-permissions.sh
Env="HOME=/root"
Then systemctl daemon-reload and systemctl restart salt-master.service
Now our non-root users can use salt CLI with publisher_acl again.
I hope there's never anything secret in those caches, like all your pillar contents...
@OrangeDog I smell some cynism here (I am an avid cynic myself, so no worries...). Yes indeed, security-wise, even calling this slippery ground would be a euphemism.
The first chmod is the command from https://docs.saltproject.io/en/latest/ref/publisheracl.html#permission-issues.
(I found it via google, so if that's again a wrong/outdated link, please let me know).
This is an internal developer server, all actors have access to (or are even authors of) /srv/pillar etc. anyways, so for now getting it back to working order is priority. Of course that's not really production-ready now.
For a proper fix, of course the local salt command should work without opening up half of the machine.
I guess as long as it's not a recursive chmod it should work out fine.
Yes, the command is non-recursive, also in the docs. I took a look at /var/cache/salt as a non-root user: The sensitive files (mostly the data.p file for each minion) remain 0600. After a quick look, I see no sensitive data leaked.
I hope all the sockets under /var/run/salt/master are safe too. (Running strace salt, it looks like these are the main interface between the command line utility an the salt master process) I assume that that's the same as before 3006.x.
(FWIW, the production server here does not have local users or publisher_acl, but is accessed using the API)
That was too early... A few hours later, our users (and I) can again no longer use salt CLI, all we get is Authentication error occurred.
After restarting salt-master, it works again.
Tomorrow, I will try to find out how long it takes, maybe some token/key/whatever expires?
seems this problem cause new issue when adding non default returner in master config file like using influxdb. When non-root users run salt command, the command is being run and it's like command goes immediately to the background so user can't see any output and result. but when run same command with root, it's totally fine, result goes to external DB and cli output also is working fine. any idea how to solve this issue?
seems the salt master needs a config option to set the file mode for transport files..
its hard coded like:
os.chmod(os.path.join(self.opts["sock_dir"], "workers.ipc"), 0o600) in ./salt/lib/python3.10/site-packages/salt/transport/zeromq.py
would be better if it was 0o660 and have a config option in the salt master to set the user "group" and 'umask'.
with this hard coded default it makes publisher_acl: almost pointless in my setup.
someone should just grep for all 0o600 and make it mostly a 'umask' config option in place of the hard coded values found in /salt/utils/master.py and transport/zeromq.py
Ideally chmod should not be used at all. Instead the umask when creating it should be configurable.
seems the file mode might be causing a few tickets for version 3007 perhaps related https://github.com/saltstack/salt/issues/66228
In my case I am getting the same message but with a different "cause" (syntax error).
I had manually changed to running (on the master):
a. /etc/salt/master
- configured
userto besalt-master-unprivileged - configured
pidfileto be/run/user/1002/salt-master.pid - configured
sock_dirto be/run/user/1002/master
b. /etc/salt/minion
- configured
userto besalt-minion-unprivileged - configured
pidfileto be/run/user/1003/salt-minion.pid - configured
sock_dirto be/run/user/1003/minion
(I also enabled Systemd session lingering as otherwise /run/user/xxxx/ will not be available at the next reboot)
If I run a Script with a Syntax/Formulation Error, then I get:
root@salt1:/scripts# salt "*" cmd.run bash -c "if [[ -d /tools_local ]]; then ls -l /tools_local/; fi;"
WARNING: CONFIG 'if [[ -d /tools_local ]]; then ls -l /tools_local/; fi;' directory does not exist.
[ERROR ] Unable to connect to the salt master publisher at /var/run/salt/master
The salt master could not be contacted. Is master running?
Whereas if there is no Syntax/Formulation Error, then I get correctly working:
root@salt1:/scripts# salt "*" cmd.run "bash -c \"if [[ -d /tools_local ]]; then ls -l /tools_local/; fi;\""
NOTICE: Too many minions targeted, switching to batch execution.
Minion HOST01.MYDOMAIN.TLD did not respond. No job will be sent.
Minion HOST02.MYDOMAIN.TLD did not respond. No job will be sent.
Executing run on ['HOST03.MYDOMAIN.TLD', 'HOST04.MYDOMAIN.TLD', 'HOST05.MYDOMAIN.TLD', 'HOST06.MYDOMAIN.TLD', 'HOST07.MYDOMAIN.TLD', 'HOST08.MYDOMAIN.TLD', 'HOST09.MYDOMAIN.TLD', 'HOST10.MYDOMAIN.TLD', 'HOST11.MYDOMAIN.TLD', 'HOST12.MYDOMAIN.TLD']
....
CORRECT OUTPUT
....
In fact there might be other paths that need to be fixed.
Checking with salt --config-dump | grep -i "/var" yields:
Log file path. Default: '/var/log/salt/master'.
cachedir: /var/cache/salt/master
extension_modules: /var/cache/salt/master/extmods
key_logfile: /var/log/salt/key
log_file: /var/log/salt/master
sqlite_queue_dir: /var/cache/salt/master/queues
ssh_log_file: /var/log/salt/ssh
syndic_dir: /var/cache/salt/master/syndics
syndic_log_file: /var/log/salt/syndic
syndic_pidfile: /var/run/salt-syndic.pid
token_dir: /var/cache/salt/master/tokens
- /var/cache/salt/master/extmods/utils
@MartinEmrich : your Script probably works, but it's probably better to setup a Systemd Service that calls it at every boot, because /run or /var/run will/should be emptied at each Boot.
Other possibility is to enable ACLs and use setfacl -R -d -m u:salt-master-unprivileged:rwx /path. I used this approach for the minion in /etc/salt, but I'm not sure it would work on a folder that gets trashed every time (and you do NOT want to give permissions to the whole /var/run folder to the salt master either).
I also changed the Systemd Service for salt-minion on the master so that it would run with the appropriate user, without spawning first a systemd root user process.
@luckylinux as the script runs from the salt-master unit, it is effectively run on every boot.
(As all of this just fixes 2/3 of my issues, since then the salt master runs as root again here.)
@MartinEmrich Ah, alright, I missed that part, sorry about it.
Not sure what is better security wise: having systemd session lingering to "kickstart" the user /run/user/<userid> userid and then all permissions are automatically handled.
I modified the Systemd Services as following to be double sure that it's the right user executing them. For the master I feel it was handled correctly anyway, but for the minion it was spawning a root process anyways.
/lib/systemd/system/salt-master.service is now:
[Unit]
Description=The Salt Master Server
Documentation=man:salt-master(1) file:///usr/share/doc/salt/html/contents.html https://docs.saltproject.io/en/latest/contents.html
After=network.target
[Service]
User=salt-master-unprivileged
Group=salt-master-unprivileged
LimitNOFILE=100000
Type=notify
NotifyAccess=all
RuntimeDirectory=salt-master
ExecStart=/usr/bin/salt-master
[Install]
WantedBy=multi-user.target
/lib/systemd/system/salt-minion.service is now:
[Unit]
Description=The Salt Minion
Documentation=man:salt-minion(1) file:///usr/share/doc/salt/html/contents.html https://docs.saltproject.io/en/latest/contents.html
After=network.target salt-master.service
[Service]
User=salt-minion-unprivileged
Group=salt-minion-unprivileged
LimitNOFILE=81920
Type=notify
NotifyAccess=all
RuntimeDirectory=salt-minion
ExecStart=/usr/bin/salt-minion
# Not Recommended according to Systemd Documentation
#KillMode=process
# Default Value is control-group
#KillMode=control-group
[Install]
WantedBy=multi-user.target
Note that the same "exercise" should probably also be done for salt-api, salt-proxy, salt-ssh and salt-syndic. I'm not there yet though ...