salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] Critical bug in salt [3001] whith git remote backend and `gitpython`

Open remyd1 opened this issue 5 years ago • 6 comments

Description

Following this bug, I have noticed many other bugs while refreshing or synchronizing stuffs in SaltStack 3000+ with a git backend.

Indeed, I have custom modules. These modules were not synchronized this morning. I had to force the synchronization with salt '*' saltutil.sync_modules.

However, then, I had a really serious bug with my pillars. I was not able to synchronize these informations anymore. I was wondering if this was related to the way I am synhronizing those pillars with git backend using GitPython, so I switched to pygit2.

The problem on this master is gone but is still present on masters using GitPython.

Setup Create a pillar sls file and try to synchronize it through gitfs backend and gitpython with, at least, salt-master 3000.3 and 3001. I am not sure, but it seems that old 2019.X are not affected.

Versions Report

salt --versions-report Salt Version: Salt: 3000.3

Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: 2.6.1 docker-py: Not Installed gitdb: 2.0.3 gitpython: 2.1.8 Jinja2: 2.10 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: Not Installed pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: Not Installed Python: 3.6.9 (default, Apr 18 2020, 01:56:04) python-gnupg: 0.4.1 PyYAML: 3.12 PyZMQ: 16.0.2 smmap: 2.0.3 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.2.5

System Versions: dist: Ubuntu 18.04 bionic locale: ANSI_X3.4-1968 machine: x86_64 release: 5.4.41-1-pve system: Linux version: Ubuntu 18.04 bionic

As already said, this bug also affects salt-master 3001.

Upgrading configuration and software in order to use pygit2 solves this issue:

salt --versions-report Salt Version: Salt: 3001

Dependency Versions: cffi: 1.14.0 cherrypy: Not Installed dateutil: 2.6.1 docker-py: Not Installed gitdb: 2.0.3 gitpython: 2.1.8 Jinja2: 2.10 libgit2: 1.0.0 M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: 2.20 pycrypto: 2.6.1 pycryptodome: 3.4.7 pygit2: 1.2.1 Python: 3.6.9 (default, Apr 18 2020, 01:56:04) python-gnupg: 0.4.1 PyYAML: 3.12 PyZMQ: 17.1.2 smmap: 2.0.3 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.2.5

System Versions: dist: ubuntu 18.04 Bionic Beaver locale: UTF-8 machine: x86_64 release: 4.15.0-50-generic system: Linux version: Ubuntu 18.04 Bionic Beaver

Best regards, Rémy

edit : it only affects salt 3001 and not 3000.3

remyd1 avatar Jun 19 '20 12:06 remyd1

I am not sure about what is going on here, because I switched back to gitpython after removing pygit2 and it works fine now...

Here is the list of what I have done so far:

Installing libgit2:

apt remove -y libgit2-dev
apt install -y libssh2-1 libssh2-1-dev python3-pip cmake pkg-config libssl-dev \
  python3-openssl libpcre3 libpcre3-dev libgssapi3-heimdal

ldconfig -v |grep ssh

git clone -b master git://github.com/libgit2/libgit2.git
cd libgit2
export LIBGIT2=/usr/local
mkdir build && cd $_
cmake .. -DCMAKE_INSTALL_PREFIX=$LIBGIT2
make -j
make install

pip3 install pygit2

ldconfig -v |grep git

Adding a ssh/config file or append this to it:

cat /root/.ssh/config
Host gitlab.<tld.domain.name>
    User remy
    PreferredAuthentications publickey
    IdentityFile /root/.ssh/id_rsa
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa

# checking if I can clone repositories without user/password
git clone ssh://git@<tld.domain.name>
...
cat /etc/salt/master

# changing git provider:
gitfs_provider: pygit2


# adding rsa keys for each repository :
    - pubkey: /root/.ssh/id_rsa.pub
    - privkey: /root/.ssh/id_rsa

Starting salt-master in debug mode:

salt-master -l debug

Ok, it works with pygit2.

Switching back to gitpython:

killall salt-master
pip3 uninstall pygit2
## edited /etc/salt/master to change back gitfs_provider to gitpython
## and then removed pubkey and privkey by repository
service salt-master start && tail -f /var/log/salt/master

It works normally again with gitpython...

I guess it is related to my /etc/ssh/config file; or, well (...), this behaviour is quite weird...

remyd1 avatar Jun 19 '20 14:06 remyd1

@remyd1 Thanks for all this information. There seems to be problems often with git backends and the only person I know who has worked on that is @cmcmarrow so I'll see if he can provide any insight.

xeacott avatar Jun 19 '20 17:06 xeacott

Can we get more of the actual git configs? such as the git remotes and gitfs_* configs. except for passwords if those exist.

whytewolf avatar Jun 19 '20 18:06 whytewolf

Sure @whytewolf :

fileserver_backend:
  - git
# ...
gitfs_provider: gitpython
#gitfs_provider: pygit2
gitfs_base: master
gitfs_remotes:
  - ssh://git@<private-repo-not-reachable-from-outside-here.git>:
#    - mountpoint: salt://prod
#    - base: master
    - name: states
    - root: 'salt_states'
#    - pubkey: /root/.ssh/id_rsa.pub
#    - privkey: /root/.ssh/id_rsa
  - ssh://git@<private-repo-not-reachable-from-outside-here.git>:
    - name: reactor
    - root: 'salt_reactor'
#    - pubkey: /root/.ssh/id_rsa.pub
#    - privkey: /root/.ssh/id_rsa
  - ssh://git@<private-repo-not-reachable-from-outside-here.git>:
    - name: modulefiles
#    - pubkey: /root/.ssh/id_rsa.pub
#    - privkey: /root/.ssh/id_rsa
# ...
ext_pillar:
  - git: 
    - master ssh://git@<private-repo-not-reachable-from-outside-here.git>:
      - env: base
      - name: pillar
      - root: salt_pillar
#      - pubkey: /root/.ssh/id_rsa.pub
#      - privkey: /root/.ssh/id_rsa
#  - mongo: {collection: pillar}

Other options are commented. I can give you the full path for my git repositories but that won't help you as it's nt reachable from Internet. It is just a basic private gitlab server.

That being said, I am checking my logs now on this salt master (yes, I should have done that in the first place), and after some grep on ERROR and date, I get some interesting informations:

grep -E -A20 2020-06-1[89].+ERROR /var/salt/log/master |less
2020-06-18 03:32:18,492 [salt.utils.gitfs :2361][ERROR   ][14172] Exception caught while fetching git_pillar remote 'master ssh://git@<private-repo-not-reachable-from-outside-here.git>': Cmd('git') failed due to: exit code(128)
  cmdline: git fetch -v origin
  stderr: 'fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/utils/gitfs.py", line 2349, in fetch_remotes
    if repo.fetch():
  File "/usr/lib/python3/dist-packages/salt/utils/gitfs.py", line 755, in fetch
    return self._fetch()
  File "/usr/lib/python3/dist-packages/salt/utils/gitfs.py", line 1240, in _fetch
    fetch_results = origin.fetch()
  File "/usr/lib/python3/dist-packages/git/remote.py", line 789, in fetch
    res = self._get_fetch_info_from_stderr(proc, progress)
  File "/usr/lib/python3/dist-packages/git/remote.py", line 675, in _get_fetch_info_from_stderr
    proc.wait(stderr=stderr_text)
  File "/usr/lib/python3/dist-packages/git/cmd.py", line 418, in wait
    raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git fetch -v origin

...

--
2020-06-18 11:27:57,268 [salt.loader      :1791][ERROR   ][21487] Failed to load function git.envs because its module (git) is not in the whitelist: ['gitfs']


...
2020-06-18 12:29:14,469 [salt.utils.reactor:94  ][ERROR   ][28739] Can not render SLS  for tag salt/auth. File missing or not found.
2020-06-18 12:29:24,491 [salt.loaded.int.module.cp:502 ][ERROR   ][28739] Unable to cache file 'salt://post-install-vm.sls' from saltenv 'base'.

...
2020-06-18 12:29:25,681 [salt.utils.gitfs :867 ][WARNING ][28752] gitfs_global_lock is enabled and update lockfile /var/cache/salt/master/gitfs/reactor/.git/update.lk is present for gitfs remote 'ssh://git@<private-repo-not-reachable-from-outside-here.git>''. Process 30445 obtained the lock
...

FileNotFoundError: [Errno 2] No such file or directory: '/var/cache/salt/master/gitfs/states/.git/update.lk'
2020-06-18 12:31:29,378 [salt.utils.gitfs :902 ][ERROR   ][28752] Unable to set update lock for ssh://git@<private-repo-not-reachable-from-outside-here.git> (/var/cache/salt/master/gitfs/reactor/.git/update.lk): [Errno 2] No such file or directory: '/var/cache/salt/master/gitfs/reactor/.git/update.lk' 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/utils/gitfs.py", line 830, in _lock
    self._get_lock_file(lock_type), os.O_CREAT | os.O_EXCL | os.O_WRONLY
...


2020-06-19 11:16:15,041 [salt.utils.templates:179 ][ERROR   ][28529] Rendering exception occurred
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 400, in render_jinja_tmpl
    output = template.render(**decoded_context)
  File "/usr/lib/python3/dist-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 1271, in top-level template code
jinja2.exceptions.UndefinedError: 'domainname' is undefined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 166, in render_tmpl
    output = render_str(tmplstr, context, tmplpath)
  File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 407, in render_jinja_tmpl

Please note that I removed errors related to gitpython and pygit2 because those errors happened after my initial problem (after not being able refreshing pillars, while trying to switch from gitpython to pygit2 (due to mistakes with the configuration and the pygit installation)).

The first error above and the last one are the most interesting ones. The last one was due to an error in a big pillar file, by setting and then, using wrong variable name. As I tried to display other pillar, and it also didn't work, I think this is not related to this issue.

However, the first error above might be what caused all the modifications I have done today. While I have got this error, I was able to clone that repository outside saltstack using ssh.

Maybe all these bugs are just a mishandle from me, somewhere, at some point, but all those syncing issues with a git backend, since I upgraded to 3001 do not look like a coincidence to me.

remyd1 avatar Jun 19 '20 19:06 remyd1

Hi,

Just an update for some salt versions, including 3002.

I downgraded to version 3000 in order to get (custom) module syncing (It was not working anymore). I used gitpython. I had to do this, because I have some critical orchestrator launched through crons, and modules did not syncing anymore in version 3001.

Recently (yesterday), I upgraded to version 3002. No states was able to compile. The pillars were Ok, but for all the states, I get:

<minion>:
    Data failed to compile:
----------
    No matching sls found for 'nfs' in env 'base'

It works with pygit2.

versions-report:

Salt Version:
           Salt: 3002
 
Dependency Versions:
           cffi: 1.14.3
       cherrypy: Not Installed
       dateutil: 2.7.3
      docker-py: Not Installed
          gitdb: 2.0.6
      gitpython: 3.0.7
         Jinja2: 2.10.1
        libgit2: 1.0.1
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.2
   mysql-python: Not Installed
      pycparser: 2.20
       pycrypto: 2.6.1
   pycryptodome: 3.6.1
         pygit2: 1.3.0
         Python: 3.8.5 (default, Jul 28 2020, 12:59:40)
   python-gnupg: 0.4.5
         PyYAML: 5.3.1
          PyZMQ: 18.1.1
          smmap: 2.0.5
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.2
 
System Versions:
           dist: ubuntu 20.04 focal
         locale: utf-8
        machine: x86_64
        release: 5.4.0-52-generic
         system: Linux
        version: Ubuntu 20.04 focal

remyd1 avatar Oct 28 '20 10:10 remyd1

we were not able to get to this work in the Aluminium release cycle so moving to Silicon

sagetherage avatar Mar 22 '21 21:03 sagetherage