icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

Integrate jemalloc

Open Al2Klimov opened this issue 4 years ago • 27 comments

... to speed up malloc(3) and thereby also the config loading:

fixes #8110

Also seems to fix a memory leak for some people:

fixes #8737

Reason: https://www.softwareverify.com/blog/memory-fragmentation-your-worst-nightmare/

Al2Klimov avatar Aug 05 '20 16:08 Al2Klimov

After merge

  • [ ] https://git.icinga.com/packaging/deb-icinga2/-/merge_requests/5
  • [ ] https://git.icinga.com/packaging/rpm-icinga2/-/merge_requests/5
  • [x] https://git.icinga.com/packaging/raspbian-icinga2/-/merge_requests/1
  • [ ] https://github.com/Icinga/docker-icinga2/pull/22

Al2Klimov avatar Sep 15 '20 10:09 Al2Klimov

Numbers: https://github.com/Icinga/docker-icinga2/pull/22#issuecomment-669802604

Al2Klimov avatar Sep 15 '20 15:09 Al2Klimov

@lippserd FYI: Independent of this PR a regex index doesn't make it much faster.

diff --git a/lib/base/scriptutils.cpp b/lib/base/scriptutils.cpp
index 838f20edd..001dc86ed 100644
--- a/lib/base/scriptutils.cpp
+++ b/lib/base/scriptutils.cpp
@@ -16,8 +16,12 @@
 #include "base/namespace.hpp"
 #include "config/configitem.hpp"
 #include <boost/regex.hpp>
+#include <boost/thread/locks.hpp>
+#include <boost/thread/shared_mutex.hpp>
 #include <algorithm>
 #include <set>
+#include <string>
+#include <unordered_map>
 #ifdef _WIN32
 #include <msi.h>
 #endif /* _WIN32 */
@@ -93,6 +97,11 @@ bool ScriptUtils::CastBool(const Value& value)
        return value.ToBool();
 }

+static struct {
+       std::unordered_map<std::string, boost::regex> Index;
+       boost::shared_mutex Mutex;
+} l_Regexes;
+
 bool ScriptUtils::Regex(const std::vector<Value>& args)
 {
        if (args.size() < 2)
@@ -111,7 +120,25 @@ bool ScriptUtils::Regex(const std::vector<Value>& args)
        else
                mode = MatchAll;

-       boost::regex expr(pattern.GetData());
+       const boost::regex* expr = nullptr;
+
+       {
+               auto key (pattern.GetData());
+               boost::upgrade_lock<boost::shared_mutex> shared (l_Regexes.Mutex);
+               auto pos (l_Regexes.Index.find(key));
+
+               if (pos == l_Regexes.Index.end()) {
+                       boost::upgrade_to_unique_lock<boost::shared_mutex> unique (shared);
+
+                       pos = l_Regexes.Index.find(key);
+
+                       if (pos == l_Regexes.Index.end()) {
+                               pos = l_Regexes.Index.emplace(key, key).first;
+                       }
+               }
+
+               expr = &pos->second;
+       }

        Array::Ptr texts;

@@ -128,7 +155,7 @@ bool ScriptUtils::Regex(const std::vector<Value>& args)
                        bool res = false;
                        try {
                                boost::smatch what;
-                               res = boost::regex_search(text.GetData(), what, expr);
+                               res = boost::regex_search(text.GetData(), what, *expr);
                        } catch (boost::exception&) {
                                res = false; /* exception means something went terribly wrong */
                        }
@@ -144,7 +171,7 @@ bool ScriptUtils::Regex(const std::vector<Value>& args)
        } else {
                String text = argTexts;
                boost::smatch what;
-               return boost::regex_search(text.GetData(), what, expr);
+               return boost::regex_search(text.GetData(), what, *expr);
        }
 }

Al2Klimov avatar Sep 17 '20 14:09 Al2Klimov

They'd have to install jemalloc themselves anyways

No, see https://github.com/Icinga/icinga2/pull/8152#issuecomment-692632352.

Al2Klimov avatar Mar 26 '21 14:03 Al2Klimov

~While you’re reviewing, I'll re-evaluate https://github.com/Icinga/icinga2/pull/8152#issuecomment-692632352 , so please don’t merge, yet.~

Al2Klimov avatar Apr 12 '21 11:04 Al2Klimov

Why not? We've already added Boost Coroutine in the recent past.

Al2Klimov avatar Apr 12 '21 12:04 Al2Klimov

Do we have enough data to conclude that jemalloc is an improvement in all or at least most situations? For me this is more like a tunable that you could try, maybe it's an improvement with your system and config, maybe not, we don't know for sure (yet).

julianbrost avatar Apr 12 '21 12:04 julianbrost

In big envs: I have no doubt.

In small ones: even if we make it worse, the users won’t even notice.

Al2Klimov avatar Apr 12 '21 13:04 Al2Klimov

Do you have any data to back this? Do we have any experience with running production setups with jemalloc? Take a look at this Stack Overflow answer. Do we know that Icinga won't use twice as much RAM when running with jemalloc in the long run?

Of course, a random Stack Overflow answer doesn't mean that this will happen to Icinga, but I think the conclusion in that answer is very valid: you just have to test it with your application. And as far as I can tell, the only data we have so far is that you've spun up Icinga 2 with jemalloc in Docker briefly.

I don't claim that using jemalloc will lead to problems, but we don't know for now. With the current version of the PR our builds are linked against jemalloc and you can't use them without. Should there be a problem, we'd have to push out new builds. With the previous version (i.e. adding jemalloc.so to LD_PRELOAD), you'd at least have to option to uninstall jemalloc or patch the startup script to get back the old behavior.

I'd feel much more confident about this if in the beginning, we'd view jemalloc as a potential optimization that you can try to speed up your installation and only enforce it as the default once we've gathered enough experience with this to conclude that using jemalloc will be an improvement in general.

julianbrost avatar Apr 13 '21 14:04 julianbrost

I'll spin up some test boxes w/ Icinga and Grafana and report the graphs in about one week.

Al2Klimov avatar Apr 13 '21 14:04 Al2Klimov

aklimov8152xyz

  • h = helper
  • a = w/o jemalloc
  • b = w/ jemalloc
  • m = master
  • s = satellite

deploy.tf

resource "openstack_compute_instance_v2" "aklimov8152h" {
	name = "aklimov8152h"
	region = "HetznerNBG4"
	flavor_name = "s1.medium"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

resource "openstack_compute_instance_v2" "aklimov8152am1" {
	name = "aklimov8152am1"
	region = "HetznerNBG4"
	flavor_name = "s1.xxlarge"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

resource "openstack_compute_instance_v2" "aklimov8152am2" {
	name = "aklimov8152am2"
	region = "HetznerNBG4"
	flavor_name = "s1.xxlarge"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

resource "openstack_compute_instance_v2" "aklimov8152as1" {
	name = "aklimov8152as1"
	region = "HetznerNBG4"
	flavor_name = "s1.xxlarge"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

resource "openstack_compute_instance_v2" "aklimov8152as2" {
	name = "aklimov8152as2"
	region = "HetznerNBG4"
	flavor_name = "s1.xxlarge"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

resource "openstack_compute_instance_v2" "aklimov8152bm1" {
	name = "aklimov8152bm1"
	region = "HetznerNBG4"
	flavor_name = "s1.xxlarge"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

resource "openstack_compute_instance_v2" "aklimov8152bm2" {
	name = "aklimov8152bm2"
	region = "HetznerNBG4"
	flavor_name = "s1.xxlarge"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

resource "openstack_compute_instance_v2" "aklimov8152bs1" {
	name = "aklimov8152bs1"
	region = "HetznerNBG4"
	flavor_name = "s1.xxlarge"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

resource "openstack_compute_instance_v2" "aklimov8152bs2" {
	name = "aklimov8152bs2"
	region = "HetznerNBG4"
	flavor_name = "s1.xxlarge"
	block_device {
		uuid = "${var.openstack_image}"
		source_type = "image"
		boot_index = 0
		destination_type = "volume"
		volume_size = 50
		delete_on_termination = true
	}
	network {
		name = "${var.tenant_network}"
	}
	security_groups = [ "default2" ]
	key_pair = "${var.openstack_keypair}"
}

pl.yml

---
- import_playbook: prepare.yml
- import_playbook: dns.yml
- import_playbook: squid.yml
- import_playbook: pkg.yml
- import_playbook: influx.yml
- import_playbook: grafana.yml
- import_playbook: i2mon.yml
- import_playbook: i2.yml

prepare.yml

---
- hosts: aklimov8152h
  become: yes
  become_method: sudo
  tasks:
  - name: /etc/resolv.conf
    copy:
      dest: /etc/resolv.conf
      content: |
        nameserver 9.9.9.9
  - name: apt update
    apt:
      update_cache: yes

dns.yml

---
- hosts: all
  become: yes
  become_method: sudo
  tasks: []
- hosts: aklimov8152h
  become: yes
  become_method: sudo
  tasks:
  - name: dnsmasq
    apt:
      name: dnsmasq
  - name: /etc/dnsmasq.conf
    blockinfile:
      path: /etc/dnsmasq.conf
      marker: '# {mark} general'
      block: |
        no-resolv
        no-hosts
        interface=eth0
        server=9.9.9.9
    notify: Restart dnsmasq
  - name: /etc/dnsmasq.conf
    with_inventory_hostnames: all
    blockinfile:
      path: /etc/dnsmasq.conf
      marker: '# {mark} {{ item }}'
      block: |
        address=/{{ item }}/{{ hostvars[item].ansible_default_ipv4.address }}
    notify: Restart dnsmasq
  handlers:
  - name: Restart dnsmasq
    service:
      name: dnsmasq
      state: restarted
- hosts: 'all:!aklimov8152h'
  become: yes
  become_method: sudo
  tasks:
  - name: /etc/resolv.conf
    copy:
      dest: /etc/resolv.conf
      content: |
        nameserver {{ hostvars['aklimov8152h'].ansible_default_ipv4.address }}

squid.yml

---
- hosts: all
  become: yes
  become_method: sudo
  tasks: []
- hosts: aklimov8152h
  become: yes
  become_method: sudo
  tasks:
  - name: squid-deb-proxy
    apt:
      name: squid-deb-proxy
  - name: /etc/squid-deb-proxy/allowed-networks-src.acl.d/99-*
    with_inventory_hostnames: all
    copy:
      dest: '/etc/squid-deb-proxy/allowed-networks-src.acl.d/99-{{ item }}'
      owner: root
      group: root
      mode: '0644'
      content: |
        {{ hostvars[item].ansible_default_ipv4.address }}/32
    notify: Restart squid-deb-proxy
  - name: /etc/squid-deb-proxy/mirror-dstdomain.acl.d/99-*
    loop:
    - aklimov8152h
    - packages.grafana.com
    copy:
      dest: '/etc/squid-deb-proxy/mirror-dstdomain.acl.d/99-{{ item }}'
      owner: root
      group: root
      mode: '0644'
      content: |
        {{ item }}
    notify: Restart squid-deb-proxy
  handlers:
  - name: Restart squid-deb-proxy
    service:
      name: squid-deb-proxy
      state: restarted
- hosts: all
  become: yes
  become_method: sudo
  tasks:
  - name: /etc/apt/apt.conf.d/01proxy
    copy:
      dest: /etc/apt/apt.conf.d/01proxy
      owner: root
      group: root
      mode: '0644'
      content: |
        Acquire::http { Proxy "http://aklimov8152h:8000"; };

pkg.yml

---
- hosts: aklimov8152h
  become: yes
  become_method: sudo
  tasks:
  - name: nginx
    apt:
      name: nginx
  - name: /var/www/html/*/
    loop:
    - wojm
    - wjm
    copy:
      dest: '/var/www/html/{{ item }}/'
      owner: root
      group: root
      mode: '0755'
      src: '{{ item }}/'
  - name: /var/www/html/*/
    loop:
    - influx
    file:
      path: '/var/www/html/{{ item }}'
      owner: root
      group: root
      mode: '0755'
      state: directory
  - name: /var/www/html/influx/influx.deb
    get_url:
      dest: /var/www/html/influx/influx.deb
      owner: root
      group: root
      mode: '0644'
      url: 'https://dl.influxdata.com/influxdb/releases/influxdb_1.8.4_amd64.deb'
      checksum: sha256:ad4058db83f424dad21337f3d7135de921498b652d67e2fcd2e2e070d2997a2d
  - name: dpkg-dev
    apt:
      name: dpkg-dev
  - name: /var/www/html/*/Packages
    loop:
    - wojm
    - wjm
    - influx
    shell: dpkg-scanpackages . /dev/null >Packages
    args:
      chdir: '/var/www/html/{{ item }}'
      creates: Packages.gz
  - name: Influx repo
    copy:
      dest: /etc/apt/sources.list.d/influx.list
      owner: root
      group: root
      mode: '0644'
      content: |
        deb [trusted=yes] file:///var/www/html/influx/ ./
  - name: apt update
    apt:
      update_cache: yes
- hosts: 'aklimov8152a*'
  become: yes
  become_method: sudo
  tasks:
  - name: aklimov8152h repo
    copy:
      dest: /etc/apt/sources.list.d/wojm.list
      owner: root
      group: root
      mode: '0644'
      content: |
        deb [trusted=yes] http://aklimov8152h/wojm/ ./
  - name: apt update
    apt:
      update_cache: yes
- hosts: 'aklimov8152b*'
  become: yes
  become_method: sudo
  tasks:
  - name: aklimov8152h repo
    copy:
      dest: /etc/apt/sources.list.d/wjm.list
      owner: root
      group: root
      mode: '0644'
      content: |
        deb [trusted=yes] http://aklimov8152h/wjm/ ./
  - name: apt update
    apt:
      update_cache: yes

influx.yml

---
- hosts: aklimov8152h
  become: yes
  become_method: sudo
  tasks:
  - name: influxdb
    apt:
      name: influxdb
  - name: InfluxDB service
    service:
      name: influxdb
      state: started
      enabled: yes
  - name: python-influxdb
    apt:
      name: python-influxdb
  - influxdb_database:
      database_name: icinga2

grafana.yml

- hosts: aklimov8152h
  become: yes
  become_method: sudo
  tasks:
  - name: apt-transport-https
    apt:
      name: apt-transport-https
  - name: gpg
    apt:
      name: gpg
  - name: Grafana repo key
    apt_key:
      url: https://packages.grafana.com/gpg.key
  - name: Grafana repo
    apt_repository:
      repo: deb https://packages.grafana.com/oss/deb stable main
  - name: Grafana
    apt:
      name: grafana
  - name: Grafana service
    service:
      name: grafana-server
      state: started
      enabled: yes

i2mon.yml

---
- hosts: aklimov8152h
  become: yes
  become_method: sudo
  tasks:
  - name: icinga2-bin
    apt:
      name: icinga2-bin
  - name: Icinga SSH key
    community.crypto.openssh_keypair:
      path: /var/lib/icinga2/id_rsa
      owner: nagios
  - name: Fetch Icinga SSH key
    fetch:
      dest: .tempfiles
      src: /var/lib/icinga2/id_rsa.pub
- hosts: 'all:!aklimov8152h'
  become: yes
  become_method: sudo
  tasks:
  - name: User chkbyssh
    user:
      name: chkbyssh
      system: yes
  - name: Icinga SSH key
    authorized_key:
      user: chkbyssh
      key: |-
        {{ lookup('file', '.tempfiles/aklimov8152h/var/lib/icinga2/id_rsa.pub') }}
  - name: monitoring-plugins
    apt:
      name: monitoring-plugins
  - name: git
    apt:
      name: git
  - name: check_mem repo
    git:
      dest: /opt/justintime-plugins
      repo: 'https://github.com/justintime/nagios-plugins.git'
      version: 91c4dc366ba9c132194eda62e5358e514d3aae36
  - name: check_mem
    file:
      path: /usr/lib/nagios/plugins/check_mem.pl
      state: link
      src: /opt/justintime-plugins/check_mem/check_mem.pl
- hosts: aklimov8152h
  become: yes
  become_method: sudo
  tasks:
  - name: monitoring-plugins
    apt:
      name: monitoring-plugins
  - name: icinga2 node setup
    shell: >-
      icinga2 node setup
      --zone master
      --listen 0.0.0.0,5665
      --cn {{ inventory_hostname }}
      --master
      --disable-confd
    args:
      creates: /var/lib/icinga2/certs/ca.crt
    notify: Restart Icinga 2
  - name: /etc/icinga2/zones.d/master
    file:
      path: /etc/icinga2/zones.d/master
      owner: root
      group: root
      mode: '0755'
      state: directory
  - name: /etc/icinga2/zones.d/master/*.conf
    with_inventory_hostnames: 'all:!aklimov8152h'
    copy:
      dest: '/etc/icinga2/zones.d/master/{{ item }}.conf'
      owner: root
      group: root
      mode: '0644'
      content: |
        object Host "{{ item }}" {
          address = "{{ hostvars[item].ansible_default_ipv4.address }}"
        }
    notify: Restart Icinga 2
  - name: /etc/icinga2/zones.d/master/misc.conf
    copy:
      dest: /etc/icinga2/zones.d/master/misc.conf
      owner: root
      group: root
      mode: '0644'
      src: i2mon.conf
    notify: Restart Icinga 2
  - name: /etc/icinga2/features-available/influxdb.conf
    copy:
      dest: /etc/icinga2/features-available/influxdb.conf
      owner: root
      group: root
      mode: '0644'
      src: influxdb.conf
    notify: Restart Icinga 2
  - name: /etc/icinga2/features-enabled/influxdb.conf
    file:
      path: /etc/icinga2/features-enabled/influxdb.conf
      state: link
      src: /etc/icinga2/features-available/influxdb.conf
    notify: Restart Icinga 2
  handlers:
  - name: Restart Icinga 2
    service:
      name: icinga2
      state: restarted

i2.yml

---
- hosts: 'all:!aklimov8152h'
  become: yes
  become_method: sudo
  tasks:
  - name: icinga2-bin
    apt:
      name: icinga2-bin
- hosts: 'aklimov8152*m1'
  become: yes
  become_method: sudo
  vars:
    i2sats:
      aklimov8152am1:
      - aklimov8152am2
      - aklimov8152as1
      - aklimov8152as2
      aklimov8152bm1:
      - aklimov8152bm2
      - aklimov8152bs1
      - aklimov8152bs2
  tasks:
  - name: icinga2 node setup
    shell: >-
      icinga2 node setup
      --zone master
      --listen 0.0.0.0,5665
      --cn {{ inventory_hostname }}
      --master
      --disable-confd;
      rm -f /var/cache/icinga2/icinga2.vars
    args:
      creates: /var/lib/icinga2/certs/ca.crt
    notify: Restart Icinga 2
  - name: /var/cache/icinga2/icinga2.vars
    shell: icinga2 daemon -C
    args:
      creates: /var/cache/icinga2/icinga2.vars
  - name: Icinga 2 ticket
    loop: '{{ i2sats[inventory_hostname] }}'
    shell: >-
      icinga2 pki ticket --cn {{ item }}
      >/var/cache/icinga2/{{ item }}.ticket
    args:
      creates: '/var/cache/icinga2/{{ item }}.ticket'
  - name: Fetch Icinga 2 ticket
    loop: '{{ i2sats[inventory_hostname] }}'
    fetch:
      dest: .tempfiles
      src: '/var/cache/icinga2/{{ item }}.ticket'
  - name: Fetch Icinga 2 CA cert
    fetch:
      dest: .tempfiles
      src: /var/lib/icinga2/certs/ca.crt
  - name: Fetch Icinga 2 CA key
    fetch:
      dest: .tempfiles
      src: /var/lib/icinga2/ca/ca.key
  - name: Zone dirs
    loop:
    - global
    - m
    - s1
    - s2
    file:
      path: '/etc/icinga2/zones.d/{{ item }}'
      owner: root
      group: root
      mode: '0755'
      state: directory
  - name: /etc/icinga2/zones.d/global/*.conf
    loop:
    - templates
    - applys
    copy:
      dest: '/etc/icinga2/zones.d/global/{{ item }}.conf'
      owner: root
      group: root
      mode: '0644'
      src: '{{ item }}.conf'
    notify: Restart Icinga 2
  - name: Hosts
    loop:
    - m
    - s1
    - s2
    copy:
      dest: '/etc/icinga2/zones.d/{{ item }}/hosts.conf'
      owner: root
      group: root
      mode: '0644'
      src: '{{ item }}.conf'
    notify: Restart Icinga 2
  handlers:
  - name: Restart Icinga 2
    service:
      name: icinga2
      state: restarted
- hosts: 'all:!aklimov8152h:!aklimov8152*m1'
  become: yes
  become_method: sudo
  vars:
    i2masters:
      aklimov8152am2: aklimov8152am1
      aklimov8152as1: aklimov8152am1
      aklimov8152as2: aklimov8152am1
      aklimov8152bm2: aklimov8152bm1
      aklimov8152bs1: aklimov8152bm1
      aklimov8152bs2: aklimov8152bm1
  tasks:
  - name: /var/cache/icinga2/my.ticket
    copy:
      dest: /var/cache/icinga2/my.ticket
      owner: nagios
      group: nagios
      mode: '0600'
      src: '.tempfiles/{{ i2masters[inventory_hostname] }}/var/cache/icinga2/{{ inventory_hostname }}.ticket'
  - name: icinga2 node setup
    shell: >
      icinga2 node setup
      --zone {{ inventory_hostname }}
      --endpoint {{ i2masters[inventory_hostname] }},{{ i2masters[inventory_hostname] }},5665
      --parent_zone master
      --listen 0.0.0.0,5665
      --ticket `cat /var/cache/icinga2/my.ticket`
      --cn {{ inventory_hostname }}
      --accept-config
      --accept-commands
      --disable-confd
    args:
      creates: /var/lib/icinga2/certs
    notify: Restart Icinga 2
  - name: /var/lib/icinga2/certs/ca.crt
    copy:
      dest: /var/lib/icinga2/certs/ca.crt
      owner: nagios
      group: nagios
      mode: '0644'
      src: .tempfiles/{{ i2masters[inventory_hostname] }}/var/lib/icinga2/certs/ca.crt
  handlers:
  - name: Restart Icinga 2
    service:
      name: icinga2
      state: restarted
- hosts: 'all:!aklimov8152h'
  become: yes
  become_method: sudo
  tasks:
  - name: /etc/icinga2/zones.conf
    copy:
      dest: /etc/icinga2/zones.conf
      owner: root
      group: root
      mode: '0644'
      src: zones.conf
    notify: Restart Icinga 2
  handlers:
  - name: Restart Icinga 2
    service:
      name: icinga2
      state: restarted

applys.conf

for (i in range(150000)) {
	apply Service i {
		check_command = "dummy"

		assign where true
	}
}

apply ScheduledDowntime "sd" to Service {
	author = "me"
	comment = "mine"
	ranges = { "monday - sunday" = "02:00-22:00" }

		assign where true
}

i2mon.conf

template Host default {
	check_command = "passive"
	enable_active_checks = false
}

template Service default {
	check_interval = 1s
	retry_interval = check_interval
}

template Service "by_ssh" {
	vars.original_check_command = check_command
	check_command = "by_ssh"

	vars.by_ssh_command = {{ get_check_command(service.vars.original_check_command).command }}
	vars.by_ssh_arguments = {{ get_check_command(service.vars.original_check_command).arguments }}
	vars.by_ssh_logname = "chkbyssh"
	vars.by_ssh_identity = "/var/lib/icinga2/id_rsa"
	vars.by_ssh_options = "StrictHostKeyChecking=no"
}

apply Service "load" {
	check_command = "load"
	import "by_ssh"

	vars.load_percpu = true

	assign where true
}

apply Service "mem" {
	check_command = "mem"
	import "by_ssh"

	vars.mem_used = true
	vars.mem_cache = true
	vars.mem_warning = 80
	vars.mem_critical = 90

	assign where true
}

influxdb.conf

object InfluxdbWriter "influxdb" {
	host = "aklimov8152h"
	port = 8086
	database = "icinga2"
	flush_threshold = 1024
	flush_interval = 10s

	host_template = {
		measurement = "$host.check_command$"
		tags = {
			hostname = "$host.name$"
		}
	}

	service_template = {
		measurement = "$service.check_command$"
		tags = {
			hostname = "$host.name$"
			service = "$service.name$"
		}
	}
}

m.conf

var prefix = NodeName.substr(0, "aklimov8152x".len())

object Host prefix + "m1" { }
object Host prefix + "m2" { }

s1.conf

object Host NodeName.substr(0, "aklimov8152x".len()) + "s1" { }

s2.conf

object Host NodeName.substr(0, "aklimov8152x".len()) + "s2" { }

templates.conf

template Host default {
	check_command = "passive"
	enable_active_checks = false
}

zones.conf

var prefix = NodeName.substr(0, "aklimov8152x".len())

for (i in ["1", "2"]) {
	for (lvl in ["m", "s"]) {
		object Endpoint prefix + lvl + i {
			host = name
		}
	}

	object Zone "s" + i use(prefix) {
		parent = "m"
		endpoints = [ prefix + name ]
	}
}

object Zone "m" use(prefix) {
	endpoints = [ prefix + "m1", prefix + "m2" ]
}

object Zone "global" {
	global = true
}

Results

glibc

master1

Bildschirmfoto 2021-04-22 um 14 38 19

master2

Bildschirmfoto 2021-04-22 um 14 38 36

sat1

Bildschirmfoto 2021-04-22 um 14 38 50

sat2

Bildschirmfoto 2021-04-22 um 14 38 58

jemalloc

master1

Bildschirmfoto 2021-04-22 um 14 39 08

master2

Bildschirmfoto 2021-04-22 um 14 39 31

sat1

Bildschirmfoto 2021-04-22 um 14 39 38

sat2

Bildschirmfoto 2021-04-22 um 14 39 49

Al2Klimov avatar Apr 22 '21 12:04 Al2Klimov

See also https://github.com/Icinga/icinga2/issues/8737#issuecomment-828378514 .

Al2Klimov avatar Apr 28 '21 11:04 Al2Klimov

Which screenshots belong to which malloc? Please add headings. Also, a summary of what you found out after the test would be great.

lippserd avatar Apr 30 '21 07:04 lippserd

Conclusion: jemalloc even reduces memory usage.

Al2Klimov avatar Apr 30 '21 08:04 Al2Klimov

@cla-bot check

Al2Klimov avatar Aug 04 '21 12:08 Al2Klimov

@N-o-X Aren't you testing a real world config atm? Please could you test this PR before/after w/ the config?

Al2Klimov avatar Nov 26 '21 09:11 Al2Klimov

Apropos large configs: do you all agree that we just don’t need this on Raspbian for obvious reasons?

Al2Klimov avatar Nov 26 '21 10:11 Al2Klimov

Apropos large configs: do you all agree that we just don’t need this on Raspbian for obvious reasons?

Yes.

lippserd avatar Nov 26 '21 10:11 lippserd

@N-o-X Aren't you testing a real world config atm? Please could you test this PR before/after w/ the config?

The results are kinda bad.

Setup

VM:

  • 16GB RAM
  • 8 Cores

Config:

[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 4 NotificationCommands.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 72985 Notifications.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 24 Dependencies.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 24400 HostGroups.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 28007 Hosts.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 2 EventCommands.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 7036 Downtimes.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 116 Comments.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 1 IcingaDB.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 215 Zones.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 207 Endpoints.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 11 ApiUsers.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 507 CheckCommands.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 713 Users.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 26 TimePeriods.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 88 ServiceGroups.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 2599 ScheduledDowntimes.
[2021-11-26 12:43:28 +0000] information/ConfigItem: Instantiated 383318 Services.

Tests

Command: time icinga2 daemon -C

Results 2.13.2

Run 1

real 12m32.680s user 62m25.203s sys 17m40.012s

Run 2

real 12m50.119s user 63m10.677s sys 17m24.793s

Results 2.13.2 + This PR

Run 1

real 14m40.973s user 68m20.610s sys 24m0.484s

Run 2

real 14m35.091s user 68m40.074s sys 22m32.007s

N-o-X avatar Nov 26 '21 12:11 N-o-X

Al2Klimov avatar Nov 26 '21 12:11 Al2Klimov

Good to know: https://github.com/Icinga/icinga2/issues/8737#issuecomment-1000551057

Al2Klimov avatar Jan 03 '22 09:01 Al2Klimov

One more reason: https://github.com/Icinga/icinga2/issues/8737#issuecomment-1046637022

Al2Klimov avatar Feb 21 '22 15:02 Al2Klimov

I did some playing with jemalloc 5.2.1 (using LD_PRELOAD) and a somewhat larger config modeled after some real world config (daemon -C takes around 20s on my laptop). With this, I got around a 12% performance benefit at 4% more memory usage (about 1GB peak, so about 40MB extra). Take the exact numbers with a grain of salt but they looked consistent enough across 10 runs that I think it's fair to say that in this scenario you get a nice performance benefit for a small penalty in memory use.

Full Config
const numHosts = 10000;
const numHostTemplates = 100;
const numHostGroups = 100;
const maxGroupsPerHost = 4;
const maxTemplatesPerHost = 2;
const hostTemplateApplyRules = 100;
const hostVarEqApplyRules = 10;
const hostNameMatchApplyRules = 90;
const hostNotMatchingApplyRules = 150;
const numExtraNotificationRules = 10;

object CheckCommand "dummy" {
	command = ["true"]
}

object NotificationCommand "dummy" {
	command = ["true"]
}

for (i in range(numHostGroups)) {
	object HostGroup String(i) {}
}

for (i in range(numHostTemplates)) {
	var t = "template_"+String(i)
	template Host t use (t) {
		vars[t] = 42
	}
}

for (i in range(numHosts)) {
	object Host String(i) use (i) {
		check_command = "dummy"
		for (j in range(1 + i % maxGroupsPerHost)) {
			groups += [String((7*i + 11*j) % numHostGroups)]
		}
		for (j in range(1 + i % maxTemplatesPerHost)) {
			import "template_" + String((11*i + 13*j) % numHostTemplates)
		}
		vars.v0 = "foo"
		vars.v1 = "foo"
		vars.v2 = "foo"
		vars.v3 = "foo"
		vars.v4 = "foo"
		vars.v5 = "foo"
		vars.v6 = "foo"
		vars.v7 = "foo"
		vars.v8 = "foo"
		vars.v9 = "foo"
	}

	object Endpoint String(i) {}

	object Zone String(i) use (i) {
		endpoints = [String(i)]
	}
}

for (i in range(hostTemplateApplyRules)) {
	var t = "template_" + String(i%numHostTemplates)
	apply Service t use (t) {
		check_command = "dummy"
		assign where t in host.templates
	}
}

for (i in range(hostVarEqApplyRules)) {
	var t = "var_eq_" + String(i)
	apply Service t use (i, t) {
		check_command = "dummy"
		assign where host.vars["template_" + String(i % numHostTemplates)] == 42
	}
}

for (i in range(hostNameMatchApplyRules)) {
	var t = "name_match+" + String(i)
	var p = String(i%10) + "*"
	apply Service t use (t, p) {
		check_command = "dummy"
		assign where match(p, host.name)
	}
}

for (i in range(hostNotMatchingApplyRules)) {
	var t = "no_match_" + String(i)
	var p = String(i%10) + "*"
	apply Service t use (t, p) {
		check_command = "dummy"
		assign where match("*", host.name) && host.vars.var1 == "value-never-set"
	}
}

object User "user" {}

apply Notification "all" to Host {
	command = "dummy"
	users = ["user"]
	assign where true
}

apply Notification "all" to Service {
	command = "dummy"
	users = ["user"]
	assign where true
}

for (i in range(numExtraNotificationRules)) {
	apply Notification "extra-" + String(i) to Host {
		command = "dummy"
		users = ["user"]
		assign where host.name.len() == 1
	}

	apply Notification "extra-" + String(i) to Service {
		command = "dummy"
		users = ["user"]
		assign where host.name.len() == 1
	}
}

julianbrost avatar May 11 '22 14:05 julianbrost

Something else to consider: looks like the glibc allocator is better at detecting bad stuff:

$ g++ -std=c++11 -o double-free double-free.cpp
$ g++ -std=c++11 -o heap-overflow heap-overflow.cpp 
$ ./double-free 
free(): double free detected in tcache 2
[1]    1194835 IOT instruction (core dumped)  ./double-free
$ ./heap-overflow
munmap_chunk(): invalid pointer
[1]    1194847 IOT instruction (core dumped)  ./heap-overflow
$ jemalloc.sh ./double-free
$ echo $?
0
$ jemalloc.sh ./heap-overflow
$ echo $?                    
0
double-free.cpp
int main() {
	auto x = new int[1];
	delete[] x;
	delete[] x;
}
heap-overflow.cpp
#include <vector>

int main() {
	std::vector<int> x(1), y(1);
	for (int i = 0; i < 8192; i++) {
		x[i] = 42;
		y[i] = 42;
	}
}

julianbrost avatar May 11 '22 14:05 julianbrost

I opt for just closing this one. We should invest the time it takes to test and verify if and how this affects performance into actually improving the code.

lippserd avatar Oct 04 '22 13:10 lippserd

I'm on 2.13.3:

icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.3-1)

Copyright (c) 2012-2022 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-1127.8.2.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: runner-hh8q3bz2-project-322-concurrent-0
  OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Fairly sizeable deployment and suffering from memory leaks which led me here.

Not sure if this is helpful, but I added LD_PRELOAD=/usr/lib64/libjemalloc.so.1 to /etc/sysconfig/icinga2

pre config change test for icinga2 daemon -C

real	1m16.896s
user	2m34.925s
sys	0m16.074s

post config change test for icinga2 daemon -C

real	1m5.757s
user	2m35.747s
sys	0m18.760s

Should have some solid memory graphs in the morning

steaksauce- avatar Oct 06 '22 04:10 steaksauce-

Added

fixes #8737

to OP.

Al2Klimov avatar May 02 '23 14:05 Al2Klimov