icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

InfluxDBWriter is not sending historic data from api-log

Open dgoetz opened this issue 5 years ago • 7 comments

Describe the bug

The scenario is a agent that is configured to schedule its check itself and a master which is configured to write to InfluxDB. If we disable the network interface on the agent we see the api-log in "%Programdata%/icinga2/var/lib/icinga2/api/log" growing and containing the expected checkresults with the perfdata. When we enable the network interface again we see the reconnect and logentries about synchronization but there is only one entry about sending data to InfluxDB with the current timestamp.

To Reproduce

  1. Configure the master to write to influxdb
  2. Configure a host to be an agent with configuration in its own zone
  3. Stop network communication and see entries in %Programdata%/icinga2/var/lib/icinga2/api/log/current
  4. Start network communication again and wait for entries in InfluxDB, you will see new datapoints but a gap in the graph as historic data are not send

Expected behavior

Data collected on the agent are replayed on the master so we get a graph without gaps.

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version): 2.11.4
  • Operating System and version: CentOS 7 (Master), Windows 2016 (Agent)
  • Enabled features (icinga2 feature list): influxdb (Master), checker (Agent)

dgoetz avatar Jul 21 '20 12:07 dgoetz

sat is checking:

Bildschirmfoto 2020-07-30 um 15 13 31

master is down:

Bildschirmfoto 2020-07-30 um 15 13 35

Influx shows a gap:

Bildschirmfoto 2020-07-30 um 15 13 41

master starts and gets all checks:

Bildschirmfoto 2020-07-30 um 15 14 04

Influx gap gets closed:

Bildschirmfoto 2020-07-30 um 15 14 11

Al2Klimov avatar Jul 30 '20 13:07 Al2Klimov

@Al2Klimov You are testing with Linux only, but in the reported case the agent is a Windows system. I did see this behaviour in the past too, but I don't have access to a system to verify this.

Please consider to re-open this as it is not fully verified that this issue is resolved.

mcktr avatar Jul 30 '20 13:07 mcktr

I'm afraid you're right:

Bildschirmfoto 2020-07-30 um 16 51 57

Al2Klimov avatar Jul 30 '20 14:07 Al2Klimov

  • This has not been fixed in the snapshots
  • This is a Windows-only problem
  • Firewall or master restart => same result

Al2Klimov avatar Jul 30 '20 15:07 Al2Klimov

I can confirm this issue for Linux as well.

icinga2 version: r2.12.3-1 on master and agent

check_interval = 10s

Grafana

icinga2.log on master

[2021-03-24 15:42:36 +0100] information/JsonRpcConnection: No messages for identity 'iatl.em.lan' have been received in the last 60 seconds.
[2021-03-24 15:42:36 +0100] warning/JsonRpcConnection: API client disconnected for identity 'iatl.em.lan'
[2021-03-24 15:42:36 +0100] warning/ApiListener: Removing API client for endpoint 'iatl.em.lan'. 0 API clients left.
[2021-03-24 15:43:00 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2021-03-24 15:43:19 +0100] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 3.1/s (186/min 1137/5min 1292/15min);
[2021-03-24 15:43:19 +0100] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-03-24 15:43:19 +0100] information/WorkQueue: #8 (InfluxdbWriter, influxdb) items: 0, rate: 0.766667/s (46/min 268/5min 284/15min);
[2021-03-24 15:43:20 +0100] information/IdoMysqlConnection: Pending queries: 10 (Input: 13/s; Output: 12/s)
[2021-03-24 15:43:46 +0100] information/ApiListener: New client connection for identity 'iatl.em.lan' from [192.168.1.223]:33152
[2021-03-24 15:43:46 +0100] information/ApiListener: Sending config updates for endpoint 'iatl.em.lan' in zone 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Syncing configuration files for global zone 'windows-commands' to endpoint 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/JsonRpcConnection: Received certificate request for CN 'iatl.em.lan' signed by our CA.
[2021-03-24 15:43:46 +0100] information/JsonRpcConnection: The certificate for CN 'iatl.em.lan' is valid and uptodate. Skipping automated renewal.
[2021-03-24 15:43:46 +0100] information/ApiListener: Syncing configuration files for zone 'iatl.em.lan' to endpoint 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished sending config file updates for endpoint 'iatl.em.lan' in zone 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Syncing runtime objects to endpoint 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'iatl.em.lan' in zone 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Sending replay log for endpoint 'iatl.em.lan' in zone 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished sending replay log for endpoint 'iatl.em.lan' in zone 'iatl.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished syncing endpoint 'iatl.em.lan' in zone 'iatl.em.lan'.

icinga2.log on agent

[2021-03-24 15:42:36 +0100] information/JsonRpcConnection: No messages for identity 'main.em.lan' have been received in the last 60 seconds.
[2021-03-24 15:42:36 +0100] warning/JsonRpcConnection: API client disconnected for identity 'main.em.lan'
[2021-03-24 15:42:36 +0100] warning/ApiListener: Removing API client for endpoint 'main.em.lan'. 0 API clients left.
[2021-03-24 15:42:46 +0100] information/ApiListener: Reconnecting to endpoint 'main.em.lan' via host '192.168.1.210' and port '5665'
[2021-03-24 15:42:49 +0100] critical/ApiListener: Cannot connect to host '192.168.1.210' on port '5665': No route to host
[2021-03-24 15:42:56 +0100] information/ApiListener: Reconnecting to endpoint 'main.em.lan' via host '192.168.1.210' and port '5665'
[2021-03-24 15:42:59 +0100] critical/ApiListener: Cannot connect to host '192.168.1.210' on port '5665': No route to host
[2021-03-24 15:43:06 +0100] information/ApiListener: Reconnecting to endpoint 'main.em.lan' via host '192.168.1.210' and port '5665'
[2021-03-24 15:43:06 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2021-03-24 15:43:09 +0100] critical/ApiListener: Cannot connect to host '192.168.1.210' on port '5665': No route to host
[2021-03-24 15:43:16 +0100] information/ApiListener: Reconnecting to endpoint 'main.em.lan' via host '192.168.1.210' and port '5665'
[2021-03-24 15:43:19 +0100] critical/ApiListener: Cannot connect to host '192.168.1.210' on port '5665': No route to host
[2021-03-24 15:43:26 +0100] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2021-03-24 15:43:26 +0100] information/WorkQueue: #5 (ApiListener, RelayQueue) items: 0, rate: 1.53333/s (92/min 451/5min 486/15min);
[2021-03-24 15:43:26 +0100] information/ApiListener: Reconnecting to endpoint 'main.em.lan' via host '192.168.1.210' and port '5665'
[2021-03-24 15:43:29 +0100] critical/ApiListener: Cannot connect to host '192.168.1.210' on port '5665': No route to host
[2021-03-24 15:43:36 +0100] information/ApiListener: Reconnecting to endpoint 'main.em.lan' via host '192.168.1.210' and port '5665'
[2021-03-24 15:43:39 +0100] critical/ApiListener: Cannot connect to host '192.168.1.210' on port '5665': No route to host
[2021-03-24 15:43:46 +0100] information/ApiListener: Reconnecting to endpoint 'main.em.lan' via host '192.168.1.210' and port '5665'
[2021-03-24 15:43:46 +0100] information/ApiListener: New client connection for identity 'main.em.lan' to [192.168.1.210]:5665
[2021-03-24 15:43:46 +0100] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'main.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Sending config updates for endpoint 'main.em.lan' in zone 'em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished sending config file updates for endpoint 'main.em.lan' in zone 'em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Syncing runtime objects to endpoint 'main.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished reconnecting to endpoint 'main.em.lan' via host '192.168.1.210' and port '5665'
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished syncing runtime objects to endpoint 'main.em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'main.em.lan' in zone 'em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Sending replay log for endpoint 'main.em.lan' in zone 'em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Replayed 110 messages.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished sending replay log for endpoint 'main.em.lan' in zone 'em.lan'.
[2021-03-24 15:43:46 +0100] information/ApiListener: Finished syncing endpoint 'main.em.lan' in zone 'em.lan'.

R-Sommer avatar Mar 25 '21 12:03 R-Sommer

Confirmed on OpenBSD cluster. During master downtimes satellite graphs show gaps despite checks ran on it.

Al2Klimov avatar Jun 18 '23 12:06 Al2Klimov

Duplicate of #7704

Al2Klimov avatar May 14 '24 14:05 Al2Klimov