Performance issues with larger setups
Expected Behavior
A Puppet run on a config master server should be fast for larger setups.
Current Behavior
We're currently importing about 2500 icinga2::object::host objects by leveraging the exported resources feature of Puppet. Additionally we're generating another ~1000 icinga2::object::* resources for various (hostgroups, services, commands, ...) other configuration. Currently this leads to a Puppet run taking about 5-6min. This is quite good, as we're coming down from >45min. This improvement was possible by upgrading from validate_* function to Puppet 4 data types (#350) and using the current master of puppetlabs/puppetlabs-concat. Still, 5-6min is to long.
We digged into this by running Puppet with --evaltrace and --profile which left us with two major performance bugs:
- The catalog compilation for the config master generates way to many objects and does a lot of function calls.
2017-10-20 11:31:03,329 INFO [qtp1927132449-138223] [puppetserver] Puppet functions: 72.29300000000387 s (39874 calls)
2017-10-20 11:31:03,329 INFO [qtp1927132449-138223] [puppetserver] Puppet functions -> template: 29.85000000000093 s (3726 calls)
2017-10-20 11:31:03,329 INFO [qtp1927132449-138223] [puppetserver] Puppet functions -> icinga2_attributes: 23.236999999999362 s (3695 calls)
2017-10-20 11:31:03,329 INFO [qtp1927132449-138223] [puppetserver] Puppet functions -> include: 11.91599999999979 s (3783 calls)
- The concat resource is awfully slow as it iterates through the whole catalog to find concat_fragments which match it.
Possible Solution
We currently see couple of options which could lead to better performance:
- Remove
include ::icinga2::paramsfromicinga2::objectas the class is already defined as private class. - Generate the final Icinga2 object before exporting it into the PuppetDB. Meaning we would export a concat::fragment/file resource instead of icinga2::object::*. This reduce the time used for function calls drastically while compiling the catalog for the Icinga2 config master.
- Use file resources instead of concat/concat::fragment. The explanation for this is a bit longer: The final file with its content is generated directly on the client before the client applies the catalog. Each concat file searches through the whole catalog and tries to find concat_fragments which match itself. After it gathered all of them it orders the content an puts everything in a normal
fileresource. This cost a lot of time specifically if there are a lot of resources in the catalog.
Steps to Reproduce (for bugs)
- Generate 3500 icinge2::object::* resources
- Build the catalog
- Apply the catalog
- Wait
Context
We're trying to, at least, cut the time needed for a Puppet run in half. Which currently means for us something below 2,5-3min.
Your Environment
- Module version (
puppet module list): Current master with #350 applied - Puppet version (
puppet -V): 4.10.1 - Operating System and version: Ubuntu 16.04.2
I can verify the behaviour you are describing, I've seen this in the wild a couple of times. Using exported resources to create objects in bigger environments doesn't scale. I think the actual problem is deeper than this module. Even if we implement some improvements and somehow manage to create 3500 objects in under 3 minutes, that will only work a certain amount of time. Most likely your environment will grow over time, so it's just a matter of time until you will have Puppet runs taking even more then 10 minutes. I agree that this isn't fun to use.
We decided to go for concat::fragment because it allows us to group certain configurations in one single file. The intention was to create "human friendly" configurations because we want to handle not only large environments but also the small ones. Anyways, there's probably no easy solution for this.
Have you considered using the Director with the PuppetDB module to import directly from the database? This is known to be much faster than exported resources and works with even more objects.
Using exported resources to create objects in bigger environments doesn't scale. I think the actual problem is deeper than this module.
I'm not sure about this. I think it heavily depends on what you are trying to import/export. If importing means that the compiler has to additionally generate 4 objects and a complex template, yes this doesn't scale. If you're importing finalized objects (e.g. a file resource) it should be way faster as the compiler would have to only generate a resource with some parameters.
Have you considered using the Director with the PuppetDB module to import directly from the database?
We have considered using the Director with the PuppetDB module, but, if i recall correctly, it just wasn't flexible enough for our huge vars structure.
Did you have other ideas of how to get this faster? I've looked into the core nagios_* types, but that's just an incredible mess and it doesn't look like a good solution. For us the two most important things are:
- It should (obviously) fast
- It should be possible to purge objects which aren't in the catalog anymore because the corresponding server got deactivated in PuppetDB
After a lot of testing we decided to go with a custom import. We basically replaced the import of exported resources (Icinga2::Object::Host<<| |>>) with a puppetdb_query() query which then iterates throw the objects and puts the result in one template. It looks like this:
# This is a PQL query which does the same as Icinga2::Object::Host<<| |>>
$exported_host_objects = puppetdb_query("resources { type = \"Icinga2::Object::Host\" and exported = true and ! certname = \"${::trusted['certname']}\" and nodes { deactivated is null and expired is null } }")
# Get a list of all files we want to create
$filelist = $exported_host_objects.map |$host| {
$host['parameters']['target']
}
# Transform the PuppetDB data into something we can work with in the template
$icinga2_object_hosts = $exported_host_objects.map |$host| {
{
target => $host['parameters']['target'],
object_name => $host['title'],
_attrs => {
'address' => $host['parameters']['address'],
'vars' => $host['parameters']['vars'],
'groups' => $host['parameters']['groups'],
'check_command' => $host['parameters']['check_command'],
},
}
}
unique($filelist).each |$file|{
$_icinga2_object_hosts = $icinga2_object_hosts.filter |$item| { $file == $item['target'] }
file { $file:
owner => $::icinga2::params::user,
group => $::icinga2::params::group,
mode => '0640',
content => template('sys11icinga2/hosts.conf.erb'),
notify => Class['::icinga2::service'],
}
}
<% @_icinga2_object_hosts.each do |host| -%>
object Host "<%= host['object_name'] %>" {
<%= scope.function_icinga2_attributes([host['_attrs'],2,[]]) -%>
}
<% end %>
This brought us down from about 6min to 2,5min
I'm aware that this solution is way to specific to be implemented in this module, but still wanted to share our results.
quite nice. But we'd need a more generic function to import all attributes... Maybe I've time at the upcoming Hackaton on friday at OSMC.
FYI: Upgrading from 4.10.1 to 5.3.3 brought us down to ~2min.
@baurmatt Have you implemented some of your ideas to increase the speed?
Yes, we're using the example above in our profile for the production setup and it works really well.
We're now down to ~2 min for:
- ~3200 hosts
- ~5800 other resources in the catalog
Our profiling showed that one should avoid the following things for better performance: (from higher to lower impact)
- validate_* function
- concat especially with a lot of exported objects
- include/require statements (especially for params.pp)
I'm still not quite sure how we can implement a generic solution which could be implemented in this module.
@baurmatt Did you still use this solutin for speed up your enviorment?
@koraz0815 Some optimizations (e.g. switch to native Puppet data types) have been included in newer version of this module but especially the part I've managed in https://github.com/Icinga/puppet-icinga2/issues/392#issuecomment-345690163 is still used for our production setup.
@baurmatt Thank you for the script.
one other question:
Why did you exclude the monitor-Server from the puppet dbquery?
! certname = \"${::trusted['certname']}\"
@koraz0815 tbh, I have no idea anymore. It was probably because we had a separate icinga2::object::host with some special magic. As of today this seems to be refactored and we currently run with:
# This is a PQL query which does the same as Icinga2::Object::Host<<| |>>
$exported_host_objects = puppetdb_query("resources { type = \"Icinga2::Object::Host\" and exported = true and nodes { deactivated is null and expired is null } order by certname }")
Maybe it is still interesting for you @baurmatt and @koraz0815. Have a look at https://github.com/Icinga/puppet-icinga2/tree/puppetdb-query
For hosts, this method will undoubtedly work just fine, since hosts should be unique objects at all times. Has anyone tried using the export parameter and query_objects to create hostgroups? We are currently testing this in development, because we want to create hostgroups based on puppet role, network zone and datacenter amongst others. We find that the icinga2 master (running top-down config sync) creates hostgroups in the target file(s) just as many times as the resources are exported by hosts. Does anyone else encounter this?
Our solution was to slightly alter the puppetdb query to sort by title alone (since the certificate doesn't really matter at this stage):
$pql_query = puppetdb_query("resources[parameters] { ${_environments} type = 'Icinga2::Object' and exported = true and tag = 'icinga2::instance::${destination}' and nodes { deactivated is null and expired is null } order by title }")
Then to ensure we get a unique list of objects to create by adding the unique() statement around $pql_query :
37 unique($file_list).each |$target| {
38 $objects = unique($pql_query).filter |$object| { $target == $object['parameters']['target'] }