salt
salt copied to clipboard
[BUG] pillar.get doesn't respect saltenv minion setting
Description
I expect that when I perform a pillar data lookup, then it will use the default or specified saltenv
, but I find it selects some random environment instead. For example, my pillar data has this:
saltenv: {{ saltenv }}
so I expect that I should be able to say salt-call pillar.get saltenv
and get back base
(the default), but instead I get one of the other environments.
Setup
Here's the relevant excerpt from /etc/salt/master
:
file_roots:
base:
- /srv/salt
jon:
- /srv/salt-environments/jon/states
eli:
- /srv/salt-environments/eli/states
__env__:
- /srv/salt
pillar_roots:
base:
- /srv/pillar
jon:
- /srv/salt-environments/jon/pillar
eli:
- /srv/salt-environments/eli/pillar
__env__:
- /srv/pillar
And here's my minion config which specifies the default saltenv and also that we should use the same env for the pillar data:
saltenv: base
pillarenv_from_saltenv: True
Here's my pillar top.sls
:
{ saltenv }}:
'*':
- match: glob
- ignore_missing: True
- common
and to test I put this in /srv/pillar/common.sls
:
saltenv: {{ saltenv }}
Steps to Reproduce the behavior (Include debug logs if possible and relevant)
Expected behavior
I expect that making a call like salt-call pillar.get saltenv
will give me base
, but it doesn't:
$ salt-call pillar.get saltenv
local:
jon
$ salt-call pillar.get saltenv pillarenv=base
local:
base
So basically, my default pillarenv
is base
, but calling pillar.get
with no arguments is returning pillar data from the jon
environment.
If this was only happening at the command line then it wouldn't be a big deal, but there are a bunch of places in my sls
state files where I do things like this:
{% if salt['pillar.get']('yum:stable') %}
configure stable yum repos:
...
{% endif %}
So what's happening now is that I run a state.apply
like this:
salt-call state.apply saltenv=base pillarenv=base
but the call to salt['pillar.get']('yum:stable')
returns pillar data from the jon
environment.
This seems like a bug. I realize that I could specify the saltenv
and/or pillarenv
parameters to pillar.get
, but I didn't think I had to, since the call to state.apply
specifies the environment we're using. Plus it would be a huge pain (and make things brittle) to need to pass those parameters every time we do a pillar.get
.
Versions Report Both the minion and master are on version 3000.1:
Salt Version:
Salt: 3000.1
Dependency Versions:
cffi: Not Installed
cherrypy: Not Installed
dateutil: 1.5
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.8
libgit2: Not Installed
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 2.7.5 (default, Aug 7 2019, 00:51:29)
python-gnupg: Not Installed
PyYAML: 3.11
PyZMQ: 15.3.0
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.1.4
System Versions:
dist: centos 7.7.1908 Core
locale: UTF-8
machine: x86_64
release: 4.4.131-1.el7.centos.x86_64
system: Linux
version: CentOS Linux 7.7.1908 Core
Please let me know if there's any other data or examples that I can provide.
FWIW, while this definitely IS a bug, a workaround is to make sure that
pillarenv_from_saltenv: True
is set in our master config rather than in our minion config. It definitely wasn't clear to me from reading https://docs.saltstack.com/en/latest/ref/configuration/minion.html that this option isn't set or respected in the minion config.
There still is a bug, which is that if this option is not set, then all pillar data from ALL salt environments are merged together by default instead of using the default saltenv/pillarenv.
The problem seems to be deeper than what I listed above, though unfortunately my many attempts to put together a simple example state which reproduces the problem have failed. I can consistently reproduce the issue in my environment in which we're running something like a thousand states, but not with a simple example or set of examples.
In case it helps, I've added a lot of debugging to the Python code of salt itself to trace exactly what's happening. We have a lot of different calls to salt['pillar.get'](...)
throughout our files, e.g. the call in question which reliably does the wrong thing in our states is:
{% set asterisks = salt['pillar.get']('role:voice:network_namespace:default:asterisk') %}
Through the debugging statements in our Python code, I traced this backwards through the code as follows:
- This if/else statement checks to see if
pillarenv
is set for the pillar lookup: https://github.com/saltstack/salt/blob/d172b910c95f7387fb64765f0e6f3c4b0f3657fb/salt/pillar/init.py#L649
I've found through adding log statements that almost every time this code is executed, pillarenv
is set to whatever value it should have, i.e. base
by default, or whatever I passed at the command line. However, for one particular particular invocation the saltenv
and pillarenv
are always None
for that instantiation of the Pillar
class.
-
Tracing that backwards,
Pillar.__init__
shows in log statements that thesaltenv
parameter being passed and theself.opts['pillarenv']
value usually get the correct values (e.g.base
). However, in the failing case bothsaltenv
andpillarenv
areNone
, even when I specifysaltenv=base pillarenv=base
at the command line. This explains what we see in (1) above. -
Tracing this backwards even further, new log statements show that the
saltenv
andpillarenv
fields we're getting from ourload
variable are bothNone
in this function: https://github.com/saltstack/salt/blob/d172b910c95f7387fb64765f0e6f3c4b0f3657fb/salt/master.py#L1630 -
Tracing that backwards all the way to where the get that
load
from the zeromqpayload
, I added logging statements at this line: https://github.com/saltstack/salt/blob/d172b910c95f7387fb64765f0e6f3c4b0f3657fb/salt/transport/zeromq.py#L827
This showed that in the failing case, no pillarenv
is decoded from the zeromq message which is driving the salt['pillar.get']('role:voice:network_namespace:default:asterisk')
call mentioned above. However, I can't figure out for the life of me why this is working for all of our other pillar.get
calls. I even make an identical pillar.get
call in other places and it works!
I've tried reproducing this with simpler examples, since the place where this pillar.get is being called is fairly deep in our call stack, i.e.
- the call is made from a Jinja
{% import 'path/to/somewhere' as variables %}
import - ... which is imported from a number of different places
- ... which in the failing place is imported from a different jinja template
- ... which is itself rendered by a
file.managed
state - ... which is generated by a jinja macro in our
sls
state file
In theory none of that should affect the semantics of pillar.get
, but I've tried making custom states to post here which follow those patterns to get a reproduceable failure case I can post here, with no luck so far.
So my high-level diagnosis is this:
- Some weird combination of circumstances can produce a circumstance where a call to
pillar.get
comes in over zeromq without apillarenv
field. - This ultimately leads to the if/else statement in
salt.pillar.Pillar.get_tops
collecting pillar data from all salt environments instead of only the one targeted. - I can't fix this by rewriting that if/else statement (which I tried), because even if I hard-coded a default value of
base
, that would still give us the wrong value in cases where we specified a differentsaltenv
andpillarenv
.
I'm not sure what to try next - I've spent about 2 days on this so far hoping that I could shed more light on what's happening and possibly make a Pull Request myself, but I've gotten stuck at the point where the message comes in on zeromq, and I don't have enough time to trace things back even farther. Hopefully this is enough detail for someone who knows the system better than me to have a clue of where to go next.
Please let me know if there's something else I can help test.
In case it helps, here's my updated Versions Report; we've seen this issue on all versions we've tested, but we've done a bunch of upgrades since I filed this ticket last year:
Salt Version:
Salt: 3002.2
Dependency Versions:
cffi: Not Installed
cherrypy: unknown
dateutil: 2.4.2
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 2.8.1
libgit2: Not Installed
M2Crypto: 0.35.2
Mako: Not Installed
msgpack: 0.6.2
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pycryptodome: Not Installed
pygit2: Not Installed
Python: 3.6.8 (default, Mar 30 2020, 17:04:00)
python-gnupg: Not Installed
PyYAML: 3.13
PyZMQ: 17.0.0
smmap: Not Installed
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.1.4
System Versions:
dist: centos 7 Core
locale: UTF-8
machine: x86_64
release: 4.4.131-1.el7.centos.x86_64
system: Linux
version: CentOS Linux 7 Core
Also, many thanks to everyone who works on Salt - it's such a great project, and I just wish I'd been able to unravel this particular issue on my own!
We've run into this as well. There is some case in which the Pillar class is instantiated with a None
for the saltenv
and pillarenv
arguments, however the values passed in via opts
are correct. So when running with the minion config pillarenv_from_saltenv: True
the following line of code runs and overrwrites the correct values in opts
with the incorrect None
values passed as arguments, if the minion config is set to False
everything (?) works correctly.
https://github.com/saltstack/salt/blob/d9343cca650d7197d3f6e107ffd506d25a8748ab/salt/pillar/init.py#L536-L540