puppet-python
puppet-python copied to clipboard
Don't call pip freeze to check package version (performance cost in general, permission problems in specific case)
Summary
I faced permission problems which in the end turned out to be caused by python::pip calling pip freeze calling hg showconfig on modules unrelated to the installation in charge. While problems as such are specific to my somewhat specific way of using puppet, they can be easily resolved, and fixes make things working much faster in the general case.
Appetizer
$ time pip freeze | grep -e '^SQLAlchemy'
SQLAlchemy==0.8.4
(…)
real 0m5.488s
time pip show SQLAlchemy
(…)
Version: 0.8.4
(…)
real 0m0.198s
Permission problem
I use some Puppet manifest to support managing my development environment. It's light case, without any puppet master, with Puppet serving as a helper just to ensure various packages are installed. So I do things like
sudo puppet apply manifests/devel.pp --modulepath=./modules
I agree that this is not perfectly safe way of doing things, but that's also not a production…
With python::pip installed packages it used to work ……… until I performed some manual pip install --user --edit (on completely unrelated modules). Since then, Puppet manifests started to ignore python::pip modules, and debugging output showed up info like:
Debug: Exec[pip_install_cram](provider=posix): Executing check 'pip freeze | grep -i -e ^cram=='
Debug: Executing 'pip freeze | grep -i -e ^cram=='
(…)
Debug: /Stage[main]/Python_local/Python::Pip[cram]/Exec[pip_install_cram]/unless: not trusting file /home/marcink/DEV_hg/mercurial/extension_utils/.hg/hgrc from untrusted user marcink, group marcink
(…)
Debug: /Stage[main]/Python_local/Python::Pip[cram]/Exec[pip_install_cram]/unless: Error when trying to get requirement for VCS system Command /usr/bin/hg showconfig paths.default failed with error code 1 in /home/marcink/DEV_hg/mercurial/extension_utils, falling back to uneditable format
Debug: /Stage[main]/Python_local/Python::Pip[cram]/Exec[pip_install_cram]/unless: Could not determine repository location of /home/marcink/DEV_hg/mercurial/extension_utils
(… similar things repeated for all locally installed repositories …)
Debug: /Stage[main]/Python_local/Python::Pip[cram]/Exec[pip_install_cram]/unless: cram==0.6
(here I am python::pip-ing cram which is in no way related to locally installed extension_utils)
In the end it turns out that things works so:
python::pipusespip freeze | grep Package(or| grep Package==X.Y.Z) to check whether package is installedpip freezescans all locally installed packages (and that's my fault that it scans packages installed on my local account, assudodoes not clear environment well enough, and I wouldn't open this ticket unless there were other reasons to avoid calling it)- as
pip freezeintends to emit exact specification of how to reinstall any and all of those packages, for every one of them it looks for best way of how to specify the dependency - in particular for packages installed „for edit" it applies some heuristics by looking at whether source directory is some repository, and if so, where it was pulled from, in the end emitting specs like
-e hg+ssh://master/mercurial/extension_utils@f90a7710d6b2cba2cce9dbe61e5af9f7a33e52c3#egg=mercurial_extension_utils-default - to get the latter info it spawns
hg(orgitif it were git repo) to ask it for various details (aforementionedhg showconfig paths.defaultprints default repository remote, which in typical case is the url repo was cloned from), and this commands fails due to my sudo - leaving apart my permission issues, this whole bunch of calls is simply completely unnecessary, costly, and unrelated to the task, after all the only thing which needs to be checked is whether installed module (here:
cram) is installed and in which version.
Suggested fix
-
Simpler but less effective version: swapping
pip freezetopip listand patching$grep_regexpto accomodate different syntax (pip listemits things likecram (0.6)instead ofcram==0.6emitted bypip freezeresolves my permission issue but does not offer noticeable performance gains (btw, for locally installed dirs the output is much more useful, for examplemercurial-extension-utils (1.0.0, /home/marcink/DEV_hg/mercurial/extension_utils)). To get that fix I simply replacedfreezewithlistand edited the regexp this wayif $ensure =~ /^((19|20)[0-9][0-9]-(0[1-9]|1[1-2])-([0-2][1-9]|3[0-1])|[0-9]+\.[0-9]+(\.[0-9]+)?)$/ { $grep_regex = "^" + $pkgname + " [\\(]" + $ensure + '[\\)]$' } else { $grep_regex = "^${pkgname} " } -
Much better idea would be to wrap
pip show PKGNAMEinstead, see appetizer above for explanation why. -
I am not 100% sure whether whole
unlessis needed at all. Blindly callingpip install PkgName==Versionmay be faster than checking whether this call is needed. For example, when packagecramis alredy installed at version 0.6, I have:
$ time pip freeze | grep cram
cram==0.6
real 0m5.535s
$ time pip show cram
---
Name: cram
Version: 0.6
(…)
real 0m0.199s
(…)
$ time pip install cram==0.6
Requirement already satisfied (use --upgrade to upgrade): cram==0.6 in
(…)
real 0m0.197s
so, as we can see the actual cost of checking whether pip install is needed is exactly equal to just calling it.
On performance; We were facing performance issues on one of our weaker nodes with a fair amount of Python packages which essentially was caused solely because of pip freeze being called repeatedly to check for present installations and so on for each package to install.
In order to resolve this I wrote a small wrapper around the stankevich-python and a custom fact.
Important to note is that this was for a masterless Puppet installation (pretty much like this I guess https://www.digitalocean.com/community/tutorials/how-to-set-up-a-masterless-puppet-environment-on-ubuntu-14-04), I have no clue if this has any impact on whether or not the solution is worthwhile for other installations.
Fact:
# Extracts python modules and puts them into a hash
# containing the name and version
modules = Hash.new
if Facter::Util::Resolution.which('pip')
pip_output = `pip freeze`
python_modules = pip_output.split("\n")
python_modules.each { |package|
parts = package.split("==")
modules[parts[0].downcase] = parts[1]
}
Facter.add('python_modules') do
setcode do
modules
end
end
end
Note that this doesn't execute if it can't find pip.
Here's a small sample from the resulting manifest, showing how the fact was used:
define install_python_module ($ensure) {
if $python_modules {
case $ensure {
'present': {
unless $title in $python_modules {
python::pip { $title: ensure => 'present' }
}
}
'latest': {
python::pip { $title: ensure => 'latest' }
}
'absent': {
if $title in $python_modules {
python::pip { $title: ensure => 'absent' }
}
}
default: {
# Deal with versions here
#
# I think the following snippet should work but I've not tested it
#
# if $title in $python_modules and $ensure != $python_modules[$title] {
# python::pip { $title: ensure => $ensure }
# }
notice("Version handling is not implemented yet")
}
}
}
else {
python::pip { $title: ensure => $ensure }
}
}
$present_packages = [ 'requests', 'pyyaml','validate-email', 'pycrypto', 'mock','iso8601']
install_python_module { $present_packages:
ensure => "present"
}
Note the check for whether or not the python_modules variable is not undef due to the above mentioned dependency on pip to set up the fact, not the most glorious solution I've come up with but it works.
Maybe this is something the module could continue building upon. Perhaps it works the current way because of cross-compatibility issues or something similar. In my specific use case on Ubuntu it did however lower the execution time of repeated puppet apply down to a few seconds down from one or two minutes. Obviously first time setups are still slow but that doesn't affect our production environment.
+1 I'm running puppet masterless on low-performance embedded systems and these pip freezes take up well over 50% of a run where nothing needs to change.
To follow up on @Deus-Vult workaround above, here's a python external structured fact that you can put in a module (eg. <module>/facts.d/pip.py):
#!/usr/bin/env python
import subprocess
modules = {}
pipoutput = subprocess.Popen(["pip", "freeze"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
pips = pipoutput.communicate()[0].split()
for pip in pips:
[package,version] = pip.split("==")
modules[package)] = version
print "python_modules="+str(modules)
Note older versions of puppet need to set this in puppet.conf [main]:
stringify_facts = false
+1