icinga2
icinga2 copied to clipboard
warning/JsonRpcConnection: Error while sending JSON-RPC message for identity (satellite) [2.13.7]
Describe the bug
Our master-satellite sync is breaking with this error/stacktrace (from the master icinga2.log):
[2023-06-12 16:58:32 +0200] information/ApiListener: Reconnecting to endpoint 'xxx-satellite.domain.cz' via host 'xxx.yyy.zzz' and port '5665'
[2023-06-12 16:58:32 +0200] information/ApiListener: New client connection for identity 'xxx-satellite.domain.cz' to [xxx.yyy.zzz]:5665
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished reconnecting to endpoint 'xxx-satellite.domain.cz' via host 'xxx.yyy.zzz' and port '5665'
[2023-06-12 16:58:32 +0200] information/ApiListener: Sending config updates for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Syncing configuration files for zone 'xxx' to endpoint 'xxx-satellite.domain.cz'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Syncing configuration files for global zone 'director-global' to endpoint 'xxx-satellite.domain.cz'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished sending config file updates for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Syncing runtime objects to endpoint 'xxx-satellite.domain.cz'.
[2023-06-12 16:58:32 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'xxx-satellite.domain.cz'
Error: Connection reset by peer
Stacktrace:
0# __cxa_throw in /usr/lib64/icinga2/sbin/icinga2
1# 0x000000000086B7BA in /usr/lib64/icinga2/sbin/icinga2
2# icinga::JsonRpcConnection::WriteOutgoingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib64/icinga2/sbin/icinga2
3# 0x0000000000AE4456 in /usr/lib64/icinga2/sbin/icinga2
4# 0x0000000000AE5910 in /usr/lib64/icinga2/sbin/icinga2
5# make_fcontext in /lib64/libboost_context.so.1.69.0
[2023-06-12 16:58:32 +0200] warning/JsonRpcConnection: API client disconnected for identity 'xxx-satellite.domain.cz'
[2023-06-12 16:58:32 +0200] warning/ApiListener: Removing API client for endpoint 'xxx-satellite.domain.cz'. 0 API clients left.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished syncing runtime objects to endpoint 'xxx-satellite.domain.cz'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished sending runtime config updates for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Sending replay log for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished sending replay log for endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
[2023-06-12 16:58:32 +0200] information/ApiListener: Finished syncing endpoint 'xxx-satellite.domain.cz' in zone 'xxx'.
Facts:
- error in the log occurs on the master
- 'xxx-satellite.domain.cz' is the satellite
- 'xxx' zone is the zone for hosts and checks, that are running on the xxx-satellite.domain.cz satellite
- both master and satellite are on the latest icinga 2.13.7 version
- after satellite restart (
systemctl reload icinga2
), error disappears from the master icinga2.log for a moment, the next several synchronisations are OK, but within about an hour the error starts to reappear in the log - when this error occurs in the master log, our services from zone 'xxx' are starting to switch to the UNKNOWN (purple) state in the icinga web or they seems like stopped executing
- we have also another, not yet updated satellite still on version 2.10.7, and there is no error messages for this old satellite in the master
To Reproduce
I don't know how to reproduce it. If you need our exact configuration of master, satellites and agents, we can anonymize it and send, but it's probably not necessary.
icinga2 --version
icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.7-1)
Copyright (c) 2012-2023 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <https://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
System information:
Platform: CentOS Linux
Platform version: 7 (Core)
Kernel: Linux
Kernel version: 3.10.0-1160.83.1.el7.x86_64
Architecture: x86_64
Build information:
Compiler: GNU 11.2.1
Build host: runner-hh8q3bz2-project-575-concurrent-0
OpenSSL version: OpenSSL 1.0.2k-fips 26 Jan 2017
Application information:
General paths:
Config directory: /etc/icinga2
Data directory: /var/lib/icinga2
Log directory: /var/log/icinga2
Cache directory: /var/cache/icinga2
Spool directory: /var/spool/icinga2
Run directory: /run/icinga2
Old paths (deprecated):
Installation root: /usr
Sysconf directory: /etc
Run directory (base): /run
Local state directory: /var
Internal paths:
Package data directory: /usr/share/icinga2
State path: /var/lib/icinga2/icinga2.state
Modified attributes path: /var/lib/icinga2/modified-attributes.conf
Objects path: /var/cache/icinga2/icinga2.debug
Vars path: /var/cache/icinga2/icinga2.vars
PID path: /run/icinga2/icinga2.pid
cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
icinga2 feature list
Disabled features: compatlog debuglog elasticsearch gelf graphite icingadb influxdb2 livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker command ido-pgsql influxdb mainlog notification
Icinga Web 2 version and modules
Icinga Web 2 Version 2.11.4
Git commit 11453bfa92a70a44efbf7f966f5e7f27e9300a28
PHP Version 7.3.33
Git commit date 2023-01-26
Loaded Libraries
icinga/icinga-php-library 0.11.0
icinga/icinga-php-thirdparty 0.11.0
Loaded Modules
director 1.10.2 Configure
fileshipper 1.1.0 Configure
grafana 1.3.6 Configure
incubator 0.20.0 Configure
monitoring 2.11.4 Configure
icinga2 object list --type Endpoint
Output is over 6000 lines, too long.
icinga2 object list --type Zone
Output is over 6000 lines, too long.
yum list installed | grep boost
(boost libs)
boost-program-options.x86_64
boost-regex.x86_64 1.53.0-28.el7 @CentOS7_os_x86_64
boost-system.x86_64 1.53.0-28.el7 @CentOS7_os_x86_64
boost-thread.x86_64 1.53.0-28.el7 @CentOS7_os_x86_64
boost169-chrono.x86_64 1.69.0-2.el7 @epel
boost169-context.x86_64 1.69.0-2.el7 @epel
boost169-coroutine.x86_64 1.69.0-2.el7 @epel
boost169-date-time.x86_64 1.69.0-2.el7 @epel
boost169-filesystem.x86_64 1.69.0-2.el7 @epel
boost169-iostreams.x86_64 1.69.0-2.el7 @DataSpring_EPEL_7Server_x86_64
boost169-program-options.x86_64
boost169-regex.x86_64 1.69.0-2.el7 @epel
boost169-system.x86_64 1.69.0-2.el7 @epel
boost169-thread.x86_64 1.69.0-2.el7 @epel
Additional context
There are similar bugs here, like #9153, but the error is different (broken pipe).
What does xxx-satellite.domain.cz log when this happens?
With the stacktrace, the error probably looks more severe than it actually is. In itself, it just says that a connection got closed. Question is why that happened and why it doesn't get reestablished properly.
We are planning to upgrade to 2.14 - if the error persists on 2.14, I will send the logs.
ref/NC/792281
Are there any updates regarding this issue? We also experience this issue from week to week and after icinga2 logs the error, the amount of overdue checks on the satellite grows immediately. Only a restart of the instance helps.
ref/NC/804054
Since Friday, April 26., we have the same problem. We have 2 Master and 16 Checker (8 Checker Zones + Master). One master and two checker of one zone now didn't syncronise with the config-master. The other seven checker-zones are working fine.
[2024-04-30 14:10:00 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' Error: Connection reset by peer Stacktrace: 0# __cxa_throw in /usr/lib64/icinga2/sbin/icinga2 1# 0x00000000008C3B8C in /usr/lib64/icinga2/sbin/icinga2 2# icinga::JsonRpcConnection::WriteOutgoingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib64/icinga2/sbin/icinga2 3# 0x0000000000B3DC27 in /usr/lib64/icinga2/sbin/icinga2 4# 0x0000000000B3EBCF in /usr/lib64/icinga2/sbin/icinga2 5# make_fcontext in /usr/lib64/icinga-boost/libboost_context.so.1.69.0 [2024-04-30 14:10:01 +0200] warning/JsonRpcConnection: API client disconnected for identity 'checker1' [2024-04-30 14:10:01 +0200] warning/ApiListener: Removing API client for endpoint 'checker1'. 0 API clients left.
[2024-04-30 12:11:59 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' Error: Broken pipe Stacktrace: 0# __cxa_throw in /usr/lib64/icinga2/sbin/icinga2 1# 0x00000000008C3B8C in /usr/lib64/icinga2/sbin/icinga2 2# icinga::JsonRpcConnection::WriteOutgoingMessages(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >) in /usr/lib64/icinga2/sbin/icinga2 3# 0x0000000000B3DC27 in /usr/lib64/icinga2/sbin/icinga2 4# 0x0000000000B3EBCF in /usr/lib64/icinga2/sbin/icinga2 5# make_fcontext in /usr/lib64/icinga-boost/libboost_context.so.1.69.0 [2024-04-30 12:11:59 +0200] warning/JsonRpcConnection: API client disconnected for identity '2nd-master' [2024-04-30 12:11:59 +0200] warning/ApiListener: Removing API client for endpoint '2nd-master'. 0 API clients left.`
This happens every minute:
[2024-04-30 14:25:00 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:25:17 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker2' [2024-04-30 14:25:18 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:25:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:25:56 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:26:02 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:26:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:26:48 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:26:57 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:27:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:27:33 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:27:47 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker2' [2024-04-30 14:27:57 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:28:19 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:28:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:28:46 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker2' [2024-04-30 14:28:57 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:29:03 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:29:27 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:29:48 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master' [2024-04-30 14:29:57 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:30:28 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity 'checker1' [2024-04-30 14:30:34 +0200] warning/JsonRpcConnection: Error while sending JSON-RPC message for identity '2nd-master'
I updated both master an the affected zone to version: r2.14.2-1. The other zones are running at r2.14.0-1, 2.13.2-1 and r2.10.5-1
I found a error in the debug.log:
[2024-05-07 12:43:48 +0200] notice/JsonRpcConnection: Received 'log::SetLogPosition' message from identity 'master'.
[2024-05-07 12:43:48 +0200] notice/JsonRpcConnection: Error while reading JSON-RPC message for identity 'master': Error: Length specifier must not exceed 9 characters
[2024-05-07 12:43:48 +0200] warning/JsonRpcConnection: API client disconnected for identity 'master'
[2024-05-07 12:43:48 +0200] warning/ApiListener: Removing API client for endpoint 'master'. 0 API clients left.
What´s that: "Length specifier must not exceed 9 characters"
@Skap81 Afaik, the message stream between two Icinga2 instances is Netstring encoded and the message would suggest, that message is rejected if the length specifier exceeds 9 characters, meaning more than 1000000000 Bytes (1GiB). Not sure, how it gets that big though.