salt
salt copied to clipboard
[BUG] After upgrade to 3005.3, failure to fetch latest pillar data after being updated in git pillar repo branch
Description of Issue
A minion/master is unable to get the new pillar data from an alternative branch in git after it was updated on remote git pillar repository. This was working with 3005.1, but broke after upgrade to 3005.3. It also seem to be the case for those using 3006.3. Please refer to community discussion: https://saltstackcommunity.slack.com/archives/C7K04SEJC/p1697576330136199
Setup
Amazon Linux 2 salt master, configured with git repository for states and pillar data.
- [ ] on-prem machine
- [ ] VM (Virtualbox, KVM, etc. please specify)
- [x] VM running on a cloud service, please be explicit and add details
- [ ] container (Kubernetes, Docker, containerd, etc. please specify)
- [ ] jails if it is FreeBSD
- [ ] classic packaging
- [x] onedir packaging
- [ ] used bootstrap to install
Steps to Reproduce Issue
Master config
ext_pillar:
- git:
- __env__ ssh://git@xxxxxxxxxxxxxxxxx/repo.git:
- root: pillar
git_pillar_pubkey: xxxxxxxxxxxxxxxx
git_pillar_privkey: xxxxxxxxxxxxxxxxxxx
git_pillar_update_interval: 60
pillar_merge_lists: True
Push an update to an existant pillar called test in repo.git on branch testpillar. On a minion, run salt-call pillar.get test pillarenv=testpillar
Expected behavior
The new value of pillar test should be visible in the output of the salt call on the minion.
Actual behavior
The minion get the old data of pillar test. Please note that when executing the same comand after deleting /var/cache/salt/master/git_pillar, the minion get the correct value
Versions Report
Salt Version:
Salt: 3005.3
Dependency Versions:
cffi: 1.14.6
cherrypy: unknown
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.0
libgit2: 1.5.0
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: 1.10.1
Python: 3.9.18 (main, Sep 18 2023, 18:18:39)
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: amzn 2
locale: utf-8
machine: x86_64
release: 4.14.318-241.531.amzn2.x86_64
system: Linux
version: Amazon Linux 2
Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:
- Community Wiki
- Salt’s Contributor Guide
- Join our Community Slack
- IRC on LiberaChat
- Salt Project YouTube channel
- Salt Project Twitch channel
There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!
3005.2 changed how git pillars were cached to fix a security issue. Having to clear the cache after upgrading is probably expected.
Having to clear the cache after upgrading is probably expected.
This does NOT only affect the pillar cache across upgrades.
With a running 3006.3 master, the master will NOT refresh a git_pillar branch after the first time that branch is fetched.
Steps to test:
- Have git_pillar configured with
__env__like in the bug report - Create a new branch in the git_pillar repo
- Have any minion pull the pillar data using
pillar.items pillarenv=<branchname> - Notice that the data aligns with what's in the branch
- Commit a new change to the branch in the git_pillar repo
- Do any or all of the following:
- Wait for the master
loop_intervalto expire and observe that the master log file indicates that the git_pillar remote is up-to-date - run
sudo salt-run git_pillar.updateon the master and observe that the return for the pit_pillar repo in question is "None" - restart the master
- run
saltutil.refresh_pillaron the minion in question
- Wait for the master
- Repeat pulling pillar data with any minion using
pillar.items pillarenv=<branchname> - Notice that the minion does NOT receive new pillar data from the new commit on the branch, and instead returns the same pillar data that was on the branch when the master first fetched the branch
- Repeat 5,6,7 as many times as you want
- Ultimately stop the master process,
rm -f /var/cache/salt/master/git_pillar, restart the master process - Go to step 3 and repeat forever
Are there any chances we may get a fix for this soon? It affects both 3006 and 3005 and it was introduced after the CVE patches so it makes sense to have it fixed within both versions. This kind of issues make it harder to upgrade salt version from the versions with security issues to avoid having breaking issues and keep everything working as desired.
@cmcmarrow do you have any insight into this? I think this regression happened as a result of your CVE patch https://github.com/saltstack/salt/commit/6120bcac2ee79b6e8b104612941432841eb0c8c3
Theres been no github activity on @cmcmarrow 's account since sept. Safe to say hes no longer involved on the project ☹️
I would like to add another detail: We tried to use the new released version 3005.5 which contain additional CVE patches, and we noticed that the pillar refresh issue is partially fixed and happens only on branches containing / for example: feature/test/pillar . I hope this helps fixing the issue
I had wondered if / in the branch name might be a factor, but someone on the Slack said they tested that and it happened with or without the /. Good news if true!
@Ikramromdhani Can you upgrade to latest Salt 3006 and retry, Salt 3005 has passed it's bug fix (last August 2023) and it about to run out of CVE support on the Feb 25th 2024, 24 days from now.
At this stage nothing will be getting done to Salt 3005 given that it is dead in 24 days, hardly time to debug/test/fix/release cycle, esp. since would have to build classic packages and there is a couple of days in just getting that done.
@Ikramromdhani Rereading this I remember Chad talking about how caching is changed and how the cache is now only updated correctly, rather than as it was where the cache was needlessly getting updated due to other commands being run, need to go back and find the actual commit etc., it was last year - Autumn time.
And yup, Chad is gone, moved on to greener pastures, so I am reading up on GitFS, etc. to get fully familiar (having to do more areas since less people now on the core team), so will update next week with code lines that changed, but I expect you were relying on a side-effect of an operation that has since been fixed (namely incorrectly updating the cache), but will get the full details for you.
That said, doesn't mean there isn't an error too.
We are also observing this error after recently upgrading from salt-3004 to salt-3006.4:
- We have not moved to
salt-3006.6due to #65691 - However diffing
salt/utils/gitfs.pygives no reason to believe the bug is absent in3006.6.
We have observed that when we use __env__ to effectively map pillarenv to a branch name that:
- if we create a new branch
feature/testingthat is not already in the cache, this will be pulled correctly; and - if we then add a new commit on it, or amend
HEAD, that the changes will not be pulled;
However, if we create a new branch named testing then it will be pulled, and subsequent changes to it will be pulled. That is, it seems that only branches with a / in them do not update. Unfortunately we use branch names like release/* or feature/*, like many people do so its a severe issue for us at the moment!
(This evidence conflicts with https://github.com/saltstack/salt/issues/65467#issuecomment-1921904438, although the commenter had not actually confirmed it themselves)
Note:
- We are using
GitPythonrather thanpygit2 - We do not see this issue with
gitfs... changes to our state files are pulled ... it is onlygit_pillar
@donkopotamus , that is what I was about to confirm. Tested today with 3006.6 using pygit2 and still have issues pulling changes for pillars when relying on pillarenv branch having /.
@dmurphy18 any clues if thiscan make it to 3006.7 or 3007?
I observed that the cache directory organizes cached branches by foldername=branchname, and that branches with / in the branch name does result in a subfolder created
For example, with three pillar branches:
foo,feature/barfeature/booyou might see a structure like this:
cache_dir/
foo/top.sls
feature/
bar/top.sls
boo/top.sls
If this turns out to be the actual issue, then the obvious solution would appear to be escaping the / in branch names or converting to something like -
cache_dir/
foo/top.sls
feature\/bar/top.sls
feature\/boo/top.sls
or
cache_dir/
foo/top.sls
feature-bar/top.sls
feature-boo/top.sls
or
cache_dir/
foo/top.sls
feature-SLASH-bar/top.sls
feature-SLASH-boo/top.sls
@Ikramromdhani Won't make it for Salt 3006.7, that is close to release, waiting on clean tests in the pipeline. Possibly for 3007, but on my backlog, dealing with https://github.com/saltstack/salt/issues/65816, which may be related (similar code changes for pillarenv went in at same time for gitfs locks), writing tests for that fix now. So it is on the block but limited hands since the buyout. Get to it when I can.
Not sure if it helps, but it looks like the CVE fix in 3005.2 and then https://github.com/saltstack/salt/pull/65017 in 3005.3 that mitigated a negative effect of that fix, probably contributed to this issue
Yup, the set of changes that Chad did and had to be adjusted, applied to both the gitfs changes he did along the pillarenv and cache changes, etc. That is why I expect some linkage to the gitfs changes. Those changes are also in 3006.x branches too
Just a heads up that we tested 3006.7 and the problem is still there. Any chances to get a fix for this on salt 3007? This problem is holding us for performing salt upgrades.
@dmurphy18 since the issue fix did not make it to 3007. Just would like to ask if there are specific version to target? Thank you
@Ikramromdhani Working on fix for Git FS and the Git Pillar is right after it, fix for Git FS is done but have to refractor a new test after PR review (https://github.com/saltstack/salt/pull/65937), after that then work on Git Pillar since it will be possibly affected by GitFS changes.
Do we have a timeline on when we are getting this fix released? We were trying to upgrade since last year and this is the only issue holding us back. Checked the release notes for 3006.9 and did not see it included there so was wondering if this is going to be included in 3006.10 and if yes, do we know when is it going to be released? This way we can plan our upgrade accordingly. Thank you!
@Ikramromdhani Actually working this at the moment, would help to just check still occurs on 3006.9, attempting to setup similar environment.