salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] After upgrade to 3005.3, failure to fetch latest pillar data after being updated in git pillar repo branch

Open Ikramromdhani opened this issue 2 years ago • 26 comments

Description of Issue

A minion/master is unable to get the new pillar data from an alternative branch in git after it was updated on remote git pillar repository. This was working with 3005.1, but broke after upgrade to 3005.3. It also seem to be the case for those using 3006.3. Please refer to community discussion: https://saltstackcommunity.slack.com/archives/C7K04SEJC/p1697576330136199

Setup

Amazon Linux 2 salt master, configured with git repository for states and pillar data.

  • [ ] on-prem machine
  • [ ] VM (Virtualbox, KVM, etc. please specify)
  • [x] VM running on a cloud service, please be explicit and add details
  • [ ] container (Kubernetes, Docker, containerd, etc. please specify)
  • [ ] jails if it is FreeBSD
  • [ ] classic packaging
  • [x] onedir packaging
  • [ ] used bootstrap to install

Steps to Reproduce Issue

Master config

ext_pillar:
- git:
  - __env__ ssh://git@xxxxxxxxxxxxxxxxx/repo.git:
    - root: pillar
git_pillar_pubkey: xxxxxxxxxxxxxxxx
git_pillar_privkey: xxxxxxxxxxxxxxxxxxx
git_pillar_update_interval: 60
pillar_merge_lists: True

Push an update to an existant pillar called test in repo.git on branch testpillar. On a minion, run salt-call pillar.get test pillarenv=testpillar

Expected behavior

The new value of pillar test should be visible in the output of the salt call on the minion.

Actual behavior

The minion get the old data of pillar test. Please note that when executing the same comand after deleting /var/cache/salt/master/git_pillar, the minion get the correct value

Versions Report

Salt Version:
          Salt: 3005.3

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.0
       libgit2: 1.5.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: 1.10.1
        Python: 3.9.18 (main, Sep 18 2023, 18:18:39)
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: amzn 2
        locale: utf-8
       machine: x86_64
       release: 4.14.318-241.531.amzn2.x86_64
        system: Linux
       version: Amazon Linux 2

Ikramromdhani avatar Oct 26 '23 08:10 Ikramromdhani

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

welcome[bot] avatar Oct 26 '23 08:10 welcome[bot]

3005.2 changed how git pillars were cached to fix a security issue. Having to clear the cache after upgrading is probably expected.

OrangeDog avatar Oct 30 '23 12:10 OrangeDog

Having to clear the cache after upgrading is probably expected.

This does NOT only affect the pillar cache across upgrades.

With a running 3006.3 master, the master will NOT refresh a git_pillar branch after the first time that branch is fetched.

Steps to test:

  1. Have git_pillar configured with __env__ like in the bug report
  2. Create a new branch in the git_pillar repo
  3. Have any minion pull the pillar data using pillar.items pillarenv=<branchname>
  4. Notice that the data aligns with what's in the branch
  5. Commit a new change to the branch in the git_pillar repo
  6. Do any or all of the following:
    • Wait for the master loop_interval to expire and observe that the master log file indicates that the git_pillar remote is up-to-date
    • run sudo salt-run git_pillar.update on the master and observe that the return for the pit_pillar repo in question is "None"
    • restart the master
    • run saltutil.refresh_pillar on the minion in question
  7. Repeat pulling pillar data with any minion using pillar.items pillarenv=<branchname>
  8. Notice that the minion does NOT receive new pillar data from the new commit on the branch, and instead returns the same pillar data that was on the branch when the master first fetched the branch
  9. Repeat 5,6,7 as many times as you want
  10. Ultimately stop the master process, rm -f /var/cache/salt/master/git_pillar, restart the master process
  11. Go to step 3 and repeat forever

HL-SaltBae avatar Nov 01 '23 14:11 HL-SaltBae

Are there any chances we may get a fix for this soon? It affects both 3006 and 3005 and it was introduced after the CVE patches so it makes sense to have it fixed within both versions. This kind of issues make it harder to upgrade salt version from the versions with security issues to avoid having breaking issues and keep everything working as desired.

Ikramromdhani avatar Nov 03 '23 13:11 Ikramromdhani

@cmcmarrow do you have any insight into this? I think this regression happened as a result of your CVE patch https://github.com/saltstack/salt/commit/6120bcac2ee79b6e8b104612941432841eb0c8c3

HL-SaltBae avatar Jan 04 '24 16:01 HL-SaltBae

Theres been no github activity on @cmcmarrow 's account since sept. Safe to say hes no longer involved on the project ☹️

ITJamie avatar Jan 31 '24 23:01 ITJamie

I would like to add another detail: We tried to use the new released version 3005.5 which contain additional CVE patches, and we noticed that the pillar refresh issue is partially fixed and happens only on branches containing / for example: feature/test/pillar . I hope this helps fixing the issue

Ikramromdhani avatar Feb 01 '24 15:02 Ikramromdhani

I had wondered if / in the branch name might be a factor, but someone on the Slack said they tested that and it happened with or without the /. Good news if true!

HL-SaltBae avatar Feb 01 '24 18:02 HL-SaltBae

@Ikramromdhani Can you upgrade to latest Salt 3006 and retry, Salt 3005 has passed it's bug fix (last August 2023) and it about to run out of CVE support on the Feb 25th 2024, 24 days from now.

At this stage nothing will be getting done to Salt 3005 given that it is dead in 24 days, hardly time to debug/test/fix/release cycle, esp. since would have to build classic packages and there is a couple of days in just getting that done.

dmurphy18 avatar Feb 01 '24 18:02 dmurphy18

@Ikramromdhani Rereading this I remember Chad talking about how caching is changed and how the cache is now only updated correctly, rather than as it was where the cache was needlessly getting updated due to other commands being run, need to go back and find the actual commit etc., it was last year - Autumn time.

And yup, Chad is gone, moved on to greener pastures, so I am reading up on GitFS, etc. to get fully familiar (having to do more areas since less people now on the core team), so will update next week with code lines that changed, but I expect you were relying on a side-effect of an operation that has since been fixed (namely incorrectly updating the cache), but will get the full details for you.

That said, doesn't mean there isn't an error too.

dmurphy18 avatar Feb 02 '24 17:02 dmurphy18

We are also observing this error after recently upgrading from salt-3004 to salt-3006.4:

  • We have not moved to salt-3006.6 due to #65691
  • However diffing salt/utils/gitfs.py gives no reason to believe the bug is absent in 3006.6.

We have observed that when we use __env__ to effectively map pillarenv to a branch name that:

  • if we create a new branch feature/testing that is not already in the cache, this will be pulled correctly; and
  • if we then add a new commit on it, or amend HEAD, that the changes will not be pulled;

However, if we create a new branch named testing then it will be pulled, and subsequent changes to it will be pulled. That is, it seems that only branches with a / in them do not update. Unfortunately we use branch names like release/* or feature/*, like many people do so its a severe issue for us at the moment!

(This evidence conflicts with https://github.com/saltstack/salt/issues/65467#issuecomment-1921904438, although the commenter had not actually confirmed it themselves)

Note:

  • We are using GitPython rather than pygit2
  • We do not see this issue with gitfs ... changes to our state files are pulled ... it is only git_pillar

donkopotamus avatar Feb 15 '24 01:02 donkopotamus

@donkopotamus , that is what I was about to confirm. Tested today with 3006.6 using pygit2 and still have issues pulling changes for pillars when relying on pillarenv branch having /. @dmurphy18 any clues if thiscan make it to 3006.7 or 3007?

Ikramromdhani avatar Feb 15 '24 13:02 Ikramromdhani

I observed that the cache directory organizes cached branches by foldername=branchname, and that branches with / in the branch name does result in a subfolder created

For example, with three pillar branches:

  • foo,
  • feature/bar
  • feature/boo you might see a structure like this:
cache_dir/
    foo/top.sls
    feature/
        bar/top.sls
        boo/top.sls

If this turns out to be the actual issue, then the obvious solution would appear to be escaping the / in branch names or converting to something like -

cache_dir/
    foo/top.sls
    feature\/bar/top.sls
    feature\/boo/top.sls

or

cache_dir/
    foo/top.sls
    feature-bar/top.sls
    feature-boo/top.sls

or

cache_dir/
    foo/top.sls
    feature-SLASH-bar/top.sls
    feature-SLASH-boo/top.sls

HL-SaltBae avatar Feb 15 '24 15:02 HL-SaltBae

@Ikramromdhani Won't make it for Salt 3006.7, that is close to release, waiting on clean tests in the pipeline. Possibly for 3007, but on my backlog, dealing with https://github.com/saltstack/salt/issues/65816, which may be related (similar code changes for pillarenv went in at same time for gitfs locks), writing tests for that fix now. So it is on the block but limited hands since the buyout. Get to it when I can.

dmurphy18 avatar Feb 15 '24 16:02 dmurphy18

Not sure if it helps, but it looks like the CVE fix in 3005.2 and then https://github.com/saltstack/salt/pull/65017 in 3005.3 that mitigated a negative effect of that fix, probably contributed to this issue

max-arnold avatar Feb 15 '24 17:02 max-arnold

Yup, the set of changes that Chad did and had to be adjusted, applied to both the gitfs changes he did along the pillarenv and cache changes, etc. That is why I expect some linkage to the gitfs changes. Those changes are also in 3006.x branches too

dmurphy18 avatar Feb 15 '24 17:02 dmurphy18

Just a heads up that we tested 3006.7 and the problem is still there. Any chances to get a fix for this on salt 3007? This problem is holding us for performing salt upgrades.

Ikramromdhani avatar Mar 06 '24 10:03 Ikramromdhani

@dmurphy18 since the issue fix did not make it to 3007. Just would like to ask if there are specific version to target? Thank you

Ikramromdhani avatar Apr 03 '24 12:04 Ikramromdhani

@Ikramromdhani Working on fix for Git FS and the Git Pillar is right after it, fix for Git FS is done but have to refractor a new test after PR review (https://github.com/saltstack/salt/pull/65937), after that then work on Git Pillar since it will be possibly affected by GitFS changes.

dmurphy18 avatar Apr 03 '24 15:04 dmurphy18

Do we have a timeline on when we are getting this fix released? We were trying to upgrade since last year and this is the only issue holding us back. Checked the release notes for 3006.9 and did not see it included there so was wondering if this is going to be included in 3006.10 and if yes, do we know when is it going to be released? This way we can plan our upgrade accordingly. Thank you!

Ikramromdhani avatar Jul 31 '24 07:07 Ikramromdhani

@Ikramromdhani Actually working this at the moment, would help to just check still occurs on 3006.9, attempting to setup similar environment.

dmurphy18 avatar Jul 31 '24 15:07 dmurphy18