[BUG] Critical bug in salt [3001] whith git remote backend and `gitpython`
Description
Following this bug, I have noticed many other bugs while refreshing or synchronizing stuffs in SaltStack 3000+ with a git backend.
Indeed, I have custom modules. These modules were not synchronized this morning. I had to force the synchronization with salt '*' saltutil.sync_modules.
However, then, I had a really serious bug with my pillars. I was not able to synchronize these informations anymore. I was wondering if this was related to the way I am synhronizing those pillars with git backend using GitPython, so I switched to pygit2.
The problem on this master is gone but is still present on masters using GitPython.
Setup
Create a pillar sls file and try to synchronize it through gitfs backend and gitpython with, at least, salt-master 3000.3 and 3001. I am not sure, but it seems that old 2019.X are not affected.
Versions Report
salt --versions-report
Salt Version: Salt: 3000.3Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: 2.6.1 docker-py: Not Installed gitdb: 2.0.3 gitpython: 2.1.8 Jinja2: 2.10 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: Not Installed pycrypto: 2.6.1 pycryptodome: Not Installed pygit2: Not Installed Python: 3.6.9 (default, Apr 18 2020, 01:56:04) python-gnupg: 0.4.1 PyYAML: 3.12 PyZMQ: 16.0.2 smmap: 2.0.3 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.2.5
System Versions: dist: Ubuntu 18.04 bionic locale: ANSI_X3.4-1968 machine: x86_64 release: 5.4.41-1-pve system: Linux version: Ubuntu 18.04 bionic
As already said, this bug also affects salt-master 3001.
Upgrading configuration and software in order to use pygit2 solves this issue:
salt --versions-report
Salt Version: Salt: 3001Dependency Versions: cffi: 1.14.0 cherrypy: Not Installed dateutil: 2.6.1 docker-py: Not Installed gitdb: 2.0.3 gitpython: 2.1.8 Jinja2: 2.10 libgit2: 1.0.0 M2Crypto: Not Installed Mako: Not Installed msgpack-pure: Not Installed msgpack-python: 0.5.6 mysql-python: Not Installed pycparser: 2.20 pycrypto: 2.6.1 pycryptodome: 3.4.7 pygit2: 1.2.1 Python: 3.6.9 (default, Apr 18 2020, 01:56:04) python-gnupg: 0.4.1 PyYAML: 3.12 PyZMQ: 17.1.2 smmap: 2.0.3 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.2.5
System Versions: dist: ubuntu 18.04 Bionic Beaver locale: UTF-8 machine: x86_64 release: 4.15.0-50-generic system: Linux version: Ubuntu 18.04 Bionic Beaver
Best regards, Rémy
edit : it only affects salt 3001 and not 3000.3
I am not sure about what is going on here, because I switched back to gitpython after removing pygit2 and it works fine now...
Here is the list of what I have done so far:
Installing libgit2:
apt remove -y libgit2-dev
apt install -y libssh2-1 libssh2-1-dev python3-pip cmake pkg-config libssl-dev \
python3-openssl libpcre3 libpcre3-dev libgssapi3-heimdal
ldconfig -v |grep ssh
git clone -b master git://github.com/libgit2/libgit2.git
cd libgit2
export LIBGIT2=/usr/local
mkdir build && cd $_
cmake .. -DCMAKE_INSTALL_PREFIX=$LIBGIT2
make -j
make install
pip3 install pygit2
ldconfig -v |grep git
Adding a ssh/config file or append this to it:
cat /root/.ssh/config
Host gitlab.<tld.domain.name>
User remy
PreferredAuthentications publickey
IdentityFile /root/.ssh/id_rsa
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_rsa
# checking if I can clone repositories without user/password
git clone ssh://git@<tld.domain.name>
...
cat /etc/salt/master
# changing git provider:
gitfs_provider: pygit2
# adding rsa keys for each repository :
- pubkey: /root/.ssh/id_rsa.pub
- privkey: /root/.ssh/id_rsa
Starting salt-master in debug mode:
salt-master -l debug
Ok, it works with pygit2.
Switching back to gitpython:
killall salt-master
pip3 uninstall pygit2
## edited /etc/salt/master to change back gitfs_provider to gitpython
## and then removed pubkey and privkey by repository
service salt-master start && tail -f /var/log/salt/master
It works normally again with gitpython...
I guess it is related to my /etc/ssh/config file; or, well (...), this behaviour is quite weird...
@remyd1 Thanks for all this information. There seems to be problems often with git backends and the only person I know who has worked on that is @cmcmarrow so I'll see if he can provide any insight.
Can we get more of the actual git configs? such as the git remotes and gitfs_* configs. except for passwords if those exist.
Sure @whytewolf :
fileserver_backend:
- git
# ...
gitfs_provider: gitpython
#gitfs_provider: pygit2
gitfs_base: master
gitfs_remotes:
- ssh://git@<private-repo-not-reachable-from-outside-here.git>:
# - mountpoint: salt://prod
# - base: master
- name: states
- root: 'salt_states'
# - pubkey: /root/.ssh/id_rsa.pub
# - privkey: /root/.ssh/id_rsa
- ssh://git@<private-repo-not-reachable-from-outside-here.git>:
- name: reactor
- root: 'salt_reactor'
# - pubkey: /root/.ssh/id_rsa.pub
# - privkey: /root/.ssh/id_rsa
- ssh://git@<private-repo-not-reachable-from-outside-here.git>:
- name: modulefiles
# - pubkey: /root/.ssh/id_rsa.pub
# - privkey: /root/.ssh/id_rsa
# ...
ext_pillar:
- git:
- master ssh://git@<private-repo-not-reachable-from-outside-here.git>:
- env: base
- name: pillar
- root: salt_pillar
# - pubkey: /root/.ssh/id_rsa.pub
# - privkey: /root/.ssh/id_rsa
# - mongo: {collection: pillar}
Other options are commented. I can give you the full path for my git repositories but that won't help you as it's nt reachable from Internet. It is just a basic private gitlab server.
That being said, I am checking my logs now on this salt master (yes, I should have done that in the first place), and after some grep on ERROR and date, I get some interesting informations:
grep -E -A20 2020-06-1[89].+ERROR /var/salt/log/master |less
2020-06-18 03:32:18,492 [salt.utils.gitfs :2361][ERROR ][14172] Exception caught while fetching git_pillar remote 'master ssh://git@<private-repo-not-reachable-from-outside-here.git>': Cmd('git') failed due to: exit code(128)
cmdline: git fetch -v origin
stderr: 'fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.'
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/utils/gitfs.py", line 2349, in fetch_remotes
if repo.fetch():
File "/usr/lib/python3/dist-packages/salt/utils/gitfs.py", line 755, in fetch
return self._fetch()
File "/usr/lib/python3/dist-packages/salt/utils/gitfs.py", line 1240, in _fetch
fetch_results = origin.fetch()
File "/usr/lib/python3/dist-packages/git/remote.py", line 789, in fetch
res = self._get_fetch_info_from_stderr(proc, progress)
File "/usr/lib/python3/dist-packages/git/remote.py", line 675, in _get_fetch_info_from_stderr
proc.wait(stderr=stderr_text)
File "/usr/lib/python3/dist-packages/git/cmd.py", line 418, in wait
raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git fetch -v origin
...
--
2020-06-18 11:27:57,268 [salt.loader :1791][ERROR ][21487] Failed to load function git.envs because its module (git) is not in the whitelist: ['gitfs']
...
2020-06-18 12:29:14,469 [salt.utils.reactor:94 ][ERROR ][28739] Can not render SLS for tag salt/auth. File missing or not found.
2020-06-18 12:29:24,491 [salt.loaded.int.module.cp:502 ][ERROR ][28739] Unable to cache file 'salt://post-install-vm.sls' from saltenv 'base'.
...
2020-06-18 12:29:25,681 [salt.utils.gitfs :867 ][WARNING ][28752] gitfs_global_lock is enabled and update lockfile /var/cache/salt/master/gitfs/reactor/.git/update.lk is present for gitfs remote 'ssh://git@<private-repo-not-reachable-from-outside-here.git>''. Process 30445 obtained the lock
...
FileNotFoundError: [Errno 2] No such file or directory: '/var/cache/salt/master/gitfs/states/.git/update.lk'
2020-06-18 12:31:29,378 [salt.utils.gitfs :902 ][ERROR ][28752] Unable to set update lock for ssh://git@<private-repo-not-reachable-from-outside-here.git> (/var/cache/salt/master/gitfs/reactor/.git/update.lk): [Errno 2] No such file or directory: '/var/cache/salt/master/gitfs/reactor/.git/update.lk'
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/utils/gitfs.py", line 830, in _lock
self._get_lock_file(lock_type), os.O_CREAT | os.O_EXCL | os.O_WRONLY
...
2020-06-19 11:16:15,041 [salt.utils.templates:179 ][ERROR ][28529] Rendering exception occurred
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 400, in render_jinja_tmpl
output = template.render(**decoded_context)
File "/usr/lib/python3/dist-packages/jinja2/asyncsupport.py", line 76, in render
return original_render(self, *args, **kwargs)
File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "<template>", line 1271, in top-level template code
jinja2.exceptions.UndefinedError: 'domainname' is undefined
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 166, in render_tmpl
output = render_str(tmplstr, context, tmplpath)
File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 407, in render_jinja_tmpl
Please note that I removed errors related to gitpython and pygit2 because those errors happened after my initial problem (after not being able refreshing pillars, while trying to switch from gitpython to pygit2 (due to mistakes with the configuration and the pygit installation)).
The first error above and the last one are the most interesting ones. The last one was due to an error in a big pillar file, by setting and then, using wrong variable name. As I tried to display other pillar, and it also didn't work, I think this is not related to this issue.
However, the first error above might be what caused all the modifications I have done today. While I have got this error, I was able to clone that repository outside saltstack using ssh.
Maybe all these bugs are just a mishandle from me, somewhere, at some point, but all those syncing issues with a git backend, since I upgraded to 3001 do not look like a coincidence to me.
Hi,
Just an update for some salt versions, including 3002.
I downgraded to version 3000 in order to get (custom) module syncing (It was not working anymore). I used gitpython. I had to do this, because I have some critical orchestrator launched through crons, and modules did not syncing anymore in version 3001.
Recently (yesterday), I upgraded to version 3002. No states was able to compile. The pillars were Ok, but for all the states, I get:
<minion>:
Data failed to compile:
----------
No matching sls found for 'nfs' in env 'base'
It works with pygit2.
versions-report:
Salt Version:
Salt: 3002
Dependency Versions:
cffi: 1.14.3
cherrypy: Not Installed
dateutil: 2.7.3
docker-py: Not Installed
gitdb: 2.0.6
gitpython: 3.0.7
Jinja2: 2.10.1
libgit2: 1.0.1
M2Crypto: Not Installed
Mako: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.6.2
mysql-python: Not Installed
pycparser: 2.20
pycrypto: 2.6.1
pycryptodome: 3.6.1
pygit2: 1.3.0
Python: 3.8.5 (default, Jul 28 2020, 12:59:40)
python-gnupg: 0.4.5
PyYAML: 5.3.1
PyZMQ: 18.1.1
smmap: 2.0.5
timelib: Not Installed
Tornado: 4.5.3
ZMQ: 4.3.2
System Versions:
dist: ubuntu 20.04 focal
locale: utf-8
machine: x86_64
release: 5.4.0-52-generic
system: Linux
version: Ubuntu 20.04 focal
we were not able to get to this work in the Aluminium release cycle so moving to Silicon