chef-icinga2
chef-icinga2 copied to clipboard
[dev.icinga.com #11072] LWRP icinga2_service is very slow
This issue has been migrated from Redmine: https://dev.icinga.com/issues/11072
Created by ascopenco on 2016-02-02 20:27:38 +00:00
Assignee: vkhatri Status: Assigned Target Version: (none) Last Update: 2016-02-16 09:15:18 +00:00 (in Redmine)
Hello,
I have issue with generation icinga2 configs with cookbook LWRPs. It take huge time. For example generation of ~2k services take more then 30 mins on VM with 4 cpu core and 16Gb memory. Tests show me the main problems is in icinga2_service. Its the slowest LWRP. 1-3 secs for one execution.
Updated by ascopenco on 2016-02-02 20:50:34 +00:00
Let me explain how time of execution is growing. At the beginning icinga2_service LWRP executes fast, execution time grow up to 10-15 sec in the end.
Updated by vkhatri on 2016-02-13 07:19:44 +00:00
- Status changed from New to Assigned
- Assigned to set to vkhatri
@ascopenco Sorry for the delay, i will take a look at it. 4cores machine indeed should not take 30mins for ~2k services.
Could you please share the Chef Server and Client details? Would be great to have a LWRP resource sample.
Updated by ascopenco on 2016-02-15 08:13:44 +00:00
Chef Server: 12.3.1 Chef Client: 12.6.0
example of LWRP:
icinga2_service "#{server['fqdn']}_ssh" do import 'check-service-tmpl-30s' display_name 'ssh' host_name server['fqdn'] max_check_attempts 4 check_command 'ssh' custom_vars :notification => notify_list end
Updated by vkhatri on 2016-02-15 13:14:06 +00:00
@ascopenco thanks!
Updated by ascopenco on 2016-02-16 09:15:18 +00:00
in addition:
VM is on CentOS release 6.7 (Final)
strace show lags on brk syscall: [pid 9791] 1.919851 brk(0x15000e000) = 0x15000e000 [pid 9791] 0.001501 brk(0x150032000) = 0x150032000 [pid 9791] 0.001288 brk(0x150056000) = 0x150056000 [pid 9791] 0.001319 brk(0x15024a000) = 0x15024a000 [pid 9791] 0.001286 brk(0x15026e000) = 0x15026e000 ... [pid 9791] 8.383971 brk(0x150421000) = 0x150421000
@scopenco Are you still facing the slow start up issue? Never got a chance to get back to you.
Yes, and I know the reasons. It's O(n*n) algorithm complexity. Each declaration of icinga2_service cause chef-client to parse service.conf, construct hash to services and search declared service in it. So when you have 10 services, you have 10*10 time parser service.conf, then you have 3-5k service, it became 4-6 hours calculations.
So How did I resolved this problem for >3k service installations? I've created a separate resource for icinga2 services declaration. It creates one file per host services so my custom icinga2_service each time read only file with one host with 5-7 services. It works much faster of course.
got same problem in configuration with 2k hosts and 20k services. It just not working because of described reason.
After some experiments - ended up with following approach:
- use cookbook only for Icinga2 packages/main config/endpoints/apilisteners.
- monkey-patched zone resource to get it work and use zone resource from cookbook.
- keep all Icinga configuration items in cookbook files in native Icinga2 syntax.
Reason: Icinga2 configuration has it's own syntax, that's already "monitoring as code". No reason to introduce one more translation layer ( Chef Resources > Icinga2 objects ).
I've created a new cookbook that creates hosts and services using API https://supermarket.chef.io/cookbooks/icinga2_api so now using this cookbook a get in 10 times better performance.