retry deadline exceeded
Describe the bug
We have two separate icinga instances running identical configurations and icingadb will randomly crash with a 'retry deadline exceeded' error.
Both of these installations are single master.
To Reproduce
Appears random
Expected behavior
That it doesn't happen
Your Environment
Include as many relevant details about the environment you experienced the problem in
- Icinga DB version: 1.2.1-1+ubuntu20.04
- Icinga 2 version: 2.14.5-1+ubuntu20.04
- Operating System and version: Ubuntu 20.04
Additional context
● icingadb.service - Icinga DB
Loaded: loaded (/lib/systemd/system/icingadb.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2025-03-11 01:50:01 AEDT; 10h ago
Process: 1112676 ExecStart=/usr/sbin/icingadb --config /etc/icingadb/config.yml (code=exited, status=1/FAILURE)
Main PID: 1112676 (code=exited, status=1/FAILURE)
Mar 11 01:49:01 master1 icingadb[1112676]: heartbeat: Waiting for Icinga heartbeat
Mar 11 01:49:20 master1 icingadb[1112676]: history-sync: Synced 5 notification history items
Mar 11 01:49:20 master1 icingadb[1112676]: history-sync: Synced 36 state history items
Mar 11 01:49:40 master1 icingadb[1112676]: history-sync: Synced 33 state history items
Mar 11 01:49:40 master1 icingadb[1112676]: history-sync: Synced 4 notification history items
Mar 11 01:50:00 master1 icingadb[1112676]: history-sync: Synced 4 notification history items
Mar 11 01:50:00 master1 icingadb[1112676]: history-sync: Synced 32 state history items
Mar 11 01:50:01 master1 icingadb[1112676]: retry deadline exceeded
github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
github.com/icinga/icingadb/pkg/icingadb/ha.go:166
runtime.goexit
runtime/asm_amd64.s:1700
HA aborted
github.com/icinga/icingadb/pkg/icingadb.(*HA).abort.func1
github.com/icinga/icingadb/pkg/icingadb/ha.go:134
sync.(*Once).doSlow
sync/once.go:76
sync.(*Once).Do
sync/once.go:67
github.com/icinga/icingadb/pkg/icingadb.(*HA).abort
github.com/icinga/icingadb/pkg/icingadb/ha.go:132
github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
github.com/icinga/icingadb/pkg/icingadb/ha.go:166
runtime.goexit
runtime/asm_amd64.s:1700
HA exited with an error
main.run
github.com/icinga/icingadb/cmd/icingadb/main.go:336
main.main
github.com/icinga/icingadb/cmd/icingadb/main.go:37
runtime.main
runtime/proc.go:272
runtime.goexit
runtime/asm_amd64.s:1700
Mar 11 01:50:01 master1 systemd[1]: icingadb.service: Main process exited, code=exited, status=1/FAILURE
Mar 11 01:50:01 master1 systemd[1]: icingadb.service: Failed with result 'exit-code'.
Thanks for posting this issue.
Could you please provide the complete Icinga DB log from program start to crash with extended systemd journald fields? Please use either --output verbose or --output json as described here, https://icinga.com/docs/icinga-db/latest/doc/03-Configuration/#systemd-journald-fields.
Furthermore, could you please post a redacted version of your Icinga DB configuration and tell us which SQL database server you are using, version included.
The logs are starting with the following line:
Mar 11 01:49:01 master1 icingadb[1112676]: heartbeat: Waiting for Icinga heartbeat
Is your Icinga 2 healthy? And how about your Redis?
Tue 2025-03-11 01:49:01.583741 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6ad;b=f52e39aa9f7542ed859a9e8f612e52c2;m=10197455a31;t=62f>
_SELINUX_CONTEXT=unconfined
_BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
_MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
_HOSTNAME=master1
_TRANSPORT=journal
_SYSTEMD_SLICE=system.slice
_CAP_EFFECTIVE=0
SYSLOG_IDENTIFIER=icingadb
_PID=1112676
_UID=116
_GID=120
_COMM=icingadb
_EXE=/usr/sbin/icingadb
_CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
_SYSTEMD_CGROUP=/system.slice/icingadb.service
_SYSTEMD_UNIT=icingadb.service
_SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
PRIORITY=4
MESSAGE=heartbeat: Waiting for Icinga heartbeat
_SOURCE_REALTIME_TIMESTAMP=1741618141583741
Tue 2025-03-11 01:49:20.584464 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6ae;b=f52e39aa9f7542ed859a9e8f612e52c2;m=101986747a8;t=62f>
PRIORITY=6
_SELINUX_CONTEXT=unconfined
_BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
_MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
_HOSTNAME=master1
_TRANSPORT=journal
_SYSTEMD_SLICE=system.slice
_CAP_EFFECTIVE=0
SYSLOG_IDENTIFIER=icingadb
_PID=1112676
_UID=116
_GID=120
_COMM=icingadb
_EXE=/usr/sbin/icingadb
_CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
_SYSTEMD_CGROUP=/system.slice/icingadb.service
_SYSTEMD_UNIT=icingadb.service
_SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
MESSAGE=history-sync: Synced 5 notification history items
_SOURCE_REALTIME_TIMESTAMP=1741618160584464
Tue 2025-03-11 01:49:20.584481 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6af;b=f52e39aa9f7542ed859a9e8f612e52c2;m=10198674b7e;t=62f>
PRIORITY=6
_SELINUX_CONTEXT=unconfined
_BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
_MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
_HOSTNAME=master1
_TRANSPORT=journal
_SYSTEMD_SLICE=system.slice
_CAP_EFFECTIVE=0
SYSLOG_IDENTIFIER=icingadb
_PID=1112676
_UID=116
_GID=120
_COMM=icingadb
_EXE=/usr/sbin/icingadb
_CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
_SYSTEMD_CGROUP=/system.slice/icingadb.service
_SYSTEMD_UNIT=icingadb.service
_SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
MESSAGE=history-sync: Synced 36 state history items
_SOURCE_REALTIME_TIMESTAMP=1741618160584481
Tue 2025-03-11 01:49:40.585094 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b0;b=f52e39aa9f7542ed859a9e8f612e52c2;m=10199987701;t=62f>
PRIORITY=6
_SELINUX_CONTEXT=unconfined
_BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
_MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
_HOSTNAME=master1
_TRANSPORT=journal
_SYSTEMD_SLICE=system.slice
_CAP_EFFECTIVE=0
SYSLOG_IDENTIFIER=icingadb
_PID=1112676
_UID=116
_GID=120
_COMM=icingadb
_EXE=/usr/sbin/icingadb
_CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
_SYSTEMD_CGROUP=/system.slice/icingadb.service
_SYSTEMD_UNIT=icingadb.service
_SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
MESSAGE=history-sync: Synced 33 state history items
_SOURCE_REALTIME_TIMESTAMP=1741618180585094
Tue 2025-03-11 01:49:40.585790 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b1;b=f52e39aa9f7542ed859a9e8f612e52c2;m=101999879ac;t=62f>
PRIORITY=6
_SELINUX_CONTEXT=unconfined
_BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
_MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
_HOSTNAME=master1
_TRANSPORT=journal
_SYSTEMD_SLICE=system.slice
_CAP_EFFECTIVE=0
SYSLOG_IDENTIFIER=icingadb
_PID=1112676
_UID=116
_GID=120
_COMM=icingadb
_EXE=/usr/sbin/icingadb
_CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
_SYSTEMD_CGROUP=/system.slice/icingadb.service
_SYSTEMD_UNIT=icingadb.service
_SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
MESSAGE=history-sync: Synced 4 notification history items
_SOURCE_REALTIME_TIMESTAMP=1741618180585790
Tue 2025-03-11 01:50:00.584382 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b7;b=f52e39aa9f7542ed859a9e8f612e52c2;m=1019ac9a142;t=62f>
PRIORITY=6
_SELINUX_CONTEXT=unconfined
_BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
_MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
_HOSTNAME=master1
_TRANSPORT=journal
_SYSTEMD_SLICE=system.slice
_CAP_EFFECTIVE=0
SYSLOG_IDENTIFIER=icingadb
_PID=1112676
_UID=116
_GID=120
_COMM=icingadb
_EXE=/usr/sbin/icingadb
_CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
_SYSTEMD_CGROUP=/system.slice/icingadb.service
_SYSTEMD_UNIT=icingadb.service
_SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
MESSAGE=history-sync: Synced 4 notification history items
_SOURCE_REALTIME_TIMESTAMP=1741618200584382
Tue 2025-03-11 01:50:00.584424 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b8;b=f52e39aa9f7542ed859a9e8f612e52c2;m=1019ac9a2d0;t=62f>
PRIORITY=6
_SELINUX_CONTEXT=unconfined
_BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
_MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
_HOSTNAME=master1
_TRANSPORT=journal
_SYSTEMD_SLICE=system.slice
_CAP_EFFECTIVE=0
SYSLOG_IDENTIFIER=icingadb
_PID=1112676
_UID=116
_GID=120
_COMM=icingadb
_EXE=/usr/sbin/icingadb
_CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
_SYSTEMD_CGROUP=/system.slice/icingadb.service
_SYSTEMD_UNIT=icingadb.service
_SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
MESSAGE=history-sync: Synced 32 state history items
_SOURCE_REALTIME_TIMESTAMP=1741618200584424
Tue 2025-03-11 01:50:01.583004 AEDT [s=c37c97a5b0b04c05a06ab5e1eabff2a7;i=103ff6b9;b=f52e39aa9f7542ed859a9e8f612e52c2;m=1019ad8de32;t=62f>
_SELINUX_CONTEXT=unconfined
_BOOT_ID=f52e39aa9f7542ed859a9e8f612e52c2
_MACHINE_ID=ed883a1210f14cbbae74c1b3fde55dc9
_HOSTNAME=master1
_TRANSPORT=journal
_SYSTEMD_SLICE=system.slice
_CAP_EFFECTIVE=0
SYSLOG_IDENTIFIER=icingadb
_PID=1112676
_UID=116
_GID=120
_COMM=icingadb
_EXE=/usr/sbin/icingadb
_CMDLINE=/usr/sbin/icingadb --config /etc/icingadb/config.yml
_SYSTEMD_CGROUP=/system.slice/icingadb.service
_SYSTEMD_UNIT=icingadb.service
_SYSTEMD_INVOCATION_ID=507ea6ed389d4e6d9f92ccce7ee098da
PRIORITY=2
MESSAGE=retry deadline exceeded
github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
github.com/icinga/icingadb/pkg/icingadb/ha.go:166
runtime.goexit
runtime/asm_amd64.s:1700
HA aborted
github.com/icinga/icingadb/pkg/icingadb.(*HA).abort.func1
github.com/icinga/icingadb/pkg/icingadb/ha.go:134
sync.(*Once).doSlow
sync/once.go:76
sync.(*Once).Do
sync/once.go:67
github.com/icinga/icingadb/pkg/icingadb.(*HA).abort
github.com/icinga/icingadb/pkg/icingadb/ha.go:132
github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
github.com/icinga/icingadb/pkg/icingadb/ha.go:166
runtime.goexit
runtime/asm_amd64.s:1700
HA exited with an error
main.run
github.com/icinga/icingadb/cmd/icingadb/main.go:336
main.main
github.com/icinga/icingadb/cmd/icingadb/main.go:37
runtime.main
runtime/proc.go:272
runtime.goexit
runtime/asm_amd64.s:1700
_SOURCE_REALTIME_TIMESTAMP=1741618201583004
database:
host: xxx
port: 3306
database: icingadb
user: icingadb
password: xxx
tls: False
ca: /usr/local/share/ca-certificates/xxx.crt
redis:
host: localhost
port: 6379
password: xxx
tls: true
insecure: true
logging:
level: info
retention:
history-days: 10
sla-days: 10
options:
acknowledgement: 90
comment: 365
downtime: 90
flapping: 10
notification: 10
state: 10
mariadb-server: 1:10.3.39-0ubuntu0.20.04.2
When this happens on both instances where icinga and redis are still running with nothing of note in their logs. The fix has been to restart the icingadb service and within a few days to a week this error will occur again. The 3 services are all running on the same host and the host itself is not loaded up.
Hello,
I have the same problem in my environment. I am running an icigna cluster consisting of 2 master nodes with a HA database connection.
Environment
- Icinga DB version: 1.3.0 (previous 1.2.0)
- Icinga 2 version: 2.14.5-1 (previous 2.14.3)
- Operating System and version: Ubuntu 22.04.5 LTS (GNU/Linux 5.15.0-136-generic x86_64)
- Database: 10.5.27-MariaDB MariaDB Server, Icinga DB Schema v6 (previous v5)
The icingadb debug logs show that the heartbeat fails and the opposite instance can no longer be reached. Subsequently, the icingadb service crashes with the previously mentioned error message.
master instance A (icngadb debug logs)
[...]
XXX: high-availability: Can't update or insert instance. Retrying
XXX: heartbeat: Previous heartbeat not read from channel
XXX: heartbeat: Previous heartbeat not read from channel
XXX: heartbeat: Previous heartbeat not read from channel
XXX: Handing over
[...]
XXX: heartbeat: Previous heartbeat not read from channel
XXX: heartbeat: Previous heartbeat not read from channel
XXX: retry deadline exceeded
github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
github.com/icinga/icingadb/pkg/icingadb/ha.go:166
runtime.goexit
runtime/asm_amd64.s:1700
HA aborted
[...]
master instance B (icngadb debug logs)
[...]
XXX: high-availability: Can't update or insert instance. Retrying
XXX: heartbeat: Previous heartbeat not read from channel
XXX: heartbeat: Previous heartbeat not read from channel
XXX: heartbeat: Previous heartbeat not read from channel
[...]
Although the two instances are running in a cluster and the icingadb service remains active on one host, the icinga web GUI shows a failure of the icingadb service and the loss of the database connection.
The behavior occurs randomly. The network connection between the two hosts is stable at all times. The problem existed both with v1.2.0 and with the current v1.3.0.
@saiiman: Thank you for sharing that this error happens on your system as well.
The "high-availability: Can't update or insert instance. Retrying" log message implies that the HA realization logic failed, most likely due to some failing SQL query. Could you please share the logs - especially this line - including the journald fields as described in our "Systemd Journald Fields" docs.
Hi, I hope the logs help to narrow down the problem.
instance A
Wed 2025-04-16 11:23:18.998176 CEST [xxx]
MESSAGE=database: Executed "INSERT INTO \"history\" [...]
[...]
Wed 2025-04-16 11:24:57.638663 CEST [XXX]
MESSAGE=high-availability: Can't update or insert instance. Retrying
ICINGADB_ERROR=can't perform "SELECT id, heartbeat FROM icingadb_instance WHERE environment_id = ? AND responsible = ? AND id <> ? FOR UPDATE": Error 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Wed 2025-04-16 11:24:57.707474 CEST [XXX]
MESSAGE=heartbeat: Previous heartbeat not read from channel
ICINGADB_PREVIOUS=2025-04-16 11:24:54.7082618 +0200 CEST
ICINGADB_CURRENT=2025-04-16 11:24:57.70739022 +0200 CEST
[...]
Wed 2025-04-16 11:25:38.998488 CEST [XXX]
MESSAGE=database: Executed "INSERT INTO \"history\" [...]
[...]
Wed 2025-04-16 11:28:57.708586 CEST [XXX]
MESSAGE=heartbeat: Previous heartbeat not read from channel
ICINGADB_PREVIOUS=2025-04-16 11:28:54.708037993 +0200 CEST
ICINGADB_CURRENT=2025-04-16 11:28:57.708451543 +0200 CEST
Wed 2025-04-16 11:29:00.708297 CEST [XXX]
MESSAGE=heartbeat: Previous heartbeat not read from channel
ICINGADB_PREVIOUS=2025-04-16 11:28:57.708451543 +0200 CEST
ICINGADB_CURRENT=2025-04-16 11:29:00.708199054 +0200 CEST
Wed 2025-04-16 11:29:00.708346 CEST [XXX]
MESSAGE=retry deadline exceeded
github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
github.com/icinga/icingadb/pkg/icingadb/ha.go:166
runtime.goexit
runtime/asm_amd64.s:1700
HA aborted
github.com/icinga/icingadb/pkg/icingadb.(*HA).abort.func1
github.com/icinga/icingadb/pkg/icingadb/ha.go:134
sync.(*Once).doSlow
sync/once.go:78
sync.(*Once).Do
sync/once.go:69
github.com/icinga/icingadb/pkg/icingadb.(*HA).abort
github.com/icinga/icingadb/pkg/icingadb/ha.go:132
github.com/icinga/icingadb/pkg/icingadb.(*HA).controller
github.com/icinga/icingadb/pkg/icingadb/ha.go:166
runtime.goexit
runtime/asm_amd64.s:1700
HA exited with an error
main.run
github.com/icinga/icingadb/cmd/icingadb/main.go:347
main.main
github.com/icinga/icingadb/cmd/icingadb/main.go:37
runtime.main
runtime/proc.go:283
runtime.goexit
runtime/asm_amd64.s:1700
#
# manual service Restart
#
[...]
Wed 2025-04-16 11:39:15.871471 CEST [XXX]
ICINGADB_ENVIRONMENT=1bed87bd19e8ebdf1a9510152bbfe2b7b978c0fa
MESSAGE=high-availability: Another instance is active
ICINGADB_INSTANCE_ID=95b7bfadcac6478ba86e5faa50dd7cfa
ICINGADB_HEARTBEAT=2025-04-16 11:39:14.062 +0200 CEST
ICINGADB_HEARTBEAT_AGE=1.809399542s
instance B
Wed 2025-04-16 11:32:47.859228 CEST [XXX]
MESSAGE=heartbeat: Previous heartbeat not read from channel
ICINGADB_PREVIOUS=2025-04-16 11:32:44.859798016 +0200 CEST
ICINGADB_CURRENT=2025-04-16 11:32:47.859152456 +0200 CEST
Wed 2025-04-16 11:32:47.863733 CEST [XXX]
MESSAGE=history-sync: Synced 1 state history items
Wed 2025-04-16 11:32:48.259701 CEST [XXX]
MESSAGE=database: Executed "INSERT INTO \"history\" [...]
[...]
#
# manual service restart
#
[...]
Wed 2025-04-16 11:39:11.789801 CEST [XXX]
MESSAGE=heartbeat: Previous heartbeat not read from channel
ICINGADB_CURRENT=2025-04-16 11:39:11.789729945 +0200 CEST
ICINGADB_PREVIOUS=2025-04-16 11:39:08.790338267 +0200 CEST
Wed 2025-04-16 11:39:15.753296 CEST [XXX]
ICINGADB_ENVIRONMENT=1bed87bd19e8ebdf1a9510152bbfe2b7b978c0fa
MESSAGE=high-availability: Preparing to take over HA as other instance's heartbeat has expired
ICINGADB_INSTANCE_ID=eea7f5b48cb6465692006c3c926c1db2
ICINGADB_HEARTBEAT=2025-04-16 11:23:03.155 +0200 CEST
ICINGADB_HEARTBEAT_AGE=16m12.598267283s
Wed 2025-04-16 11:39:15.879097 CEST [XXX]
MESSAGE=Taking over
ICINGADB_REASON=other instance's heartbeat has expired
I also noticed in the logs that the handover to instance B only took place after a manual restart of the icingadb service there.
Hi, I’m experiencing the same issue. Even though I run a single instance of Icinga, I’ve noticed it usually happens after backing up the database. This wasn’t an issue in previous versions.
@joachim162: This issue is related to the HA code. However, since i was not able to reproduce it yet, it is still open.
Could you please post your Icinga DB version, the relational database you are using (including version), and your Icinga DB logs prior to the crash? Thanks, this might help!
Since you mentioned backups: Are you using MySQL or MariaDB? If so, please take a look at the new Operations docs regarding backups.
I am using MariaDB and yeah, the mysqldump was missing the --single-transaction flag. There was no InfluxDB crash since adding it. I am sorry, thank you.
@joachim162: This issue is not your fault. I am glad that your Icinga DB now works and no longer crashes.
I too had this issue in my HA environment. The active instance suddenly wanted to hand over and then got the "retry deadline exceeded" error 5 min after that. That message is not shown with journalctl though. I could not find any more detailed information about missing heartbeats or something like that. We run MySQL 8.0.26 on a remote installation and Icinga runs on Rocky Linux 8.10 and have IcingaDB 1.3.0.
master1:
Jun 2 08:44:47 master1 icingadb[1701445]: runtime-updates: Upserted 2 Downtime items
Jun 2 08:44:47 master1 icingadb[1701445]: runtime-updates: Upserted 1 ServiceState items
Jun 2 08:45:04 master1 icingadb[1701445]: history-sync: Synced 2 state history items
Jun 2 08:45:07 master1 icingadb[1701445]: runtime-updates: Upserted 2 ServiceState items
Jun 2 08:46:24 master1 icingadb[1701445]: Handing over
Jun 2 08:51:18 master1 icingadb[1701445]: retry deadline exceeded#012github.com/icinga/icingadb/pkg/icingadb.(*HA).controller#012#011github.com/icinga/icingadb/pkg/icingadb/ha.go:166#012runtime.goexit#012#011runtime/asm_amd64.s:1700#012HA aborted#012github.com/icinga/icingadb/pkg/icingadb.(*HA).abort.func1#012#011github.com/icinga/icingadb/pkg/icingadb/ha.go:134#012sync.(*Once).doSlow#012#011sync/once.go:78#012sync.(*Once).Do#012#011sync/once.go:69#012github.com/icinga/icingadb/pkg/icingadb.(*HA).abort#012#011github.com/icinga/icingadb/pkg/icingadb/ha.go:132#012github.com/icinga/icingadb/pkg/icingadb.(*HA).controller#012#011github.com/icinga/icingadb/pkg/icingadb/ha.go:166#012runtime.goexit#012#011runtime/asm_amd64.s:1700#012HA exited with an error#012main.run#012#011github.com/icinga/icingadb/cmd/icingadb/main.go:347#012main.main#012#011github.com/icinga/icingadb/cmd/icingadb/main.go:37#012runtime.main#012#011runtime/proc.go:283#012runtime.goexit#012#011runtime/asm_amd64.s:1700
Jun 2 08:51:18 master1 systemd[1]: icingadb.service: Main process exited, code=exited, status=1/FAILURE
Jun 2 08:51:18 master1 systemd[1]: icingadb.service: Failed with result 'exit-code'.
master2:
Jun 2 08:43:44 master2 icingadb[3021718]: high-availability: Another instance is active
Jun 2 08:44:04 master2 icingadb[3021718]: history-sync: Synced 1 downtime history items
Jun 2 08:44:44 master2 icingadb[3021718]: history-sync: Synced 1 downtime history items
Jun 2 08:45:04 master2 icingadb[3021718]: history-sync: Synced 2 state history items
Jun 2 08:49:24 master2 icingadb[3021718]: history-sync: Synced 1 state history items
Jun 2 08:50:04 master2 icingadb[3021718]: history-sync: Synced 1 downtime history items
Jun 2 08:51:14 master2 icingadb[3021718]: retry deadline exceeded#012github.com/icinga/icingadb/pkg/icingadb.(*HA).controller#012#011github.com/icinga/icingadb/pkg/icingadb/ha.go:166#012runtime.goexit#012#011runtime/asm_amd64.s:1700#012HA aborted#012github.com/icinga/icingadb/pkg/icingadb.(*HA).abort.func1#012#011github.com/icinga/icingadb/pkg/icingadb/ha.go:134#012sync.(*Once).doSlow#012#011sync/once.go:78#012sync.(*Once).Do#012#011sync/once.go:69#012github.com/icinga/icingadb/pkg/icingadb.(*HA).abort#012#011github.com/icinga/icingadb/pkg/icingadb/ha.go:132#012github.com/icinga/icingadb/pkg/icingadb.(*HA).controller#012#011github.com/icinga/icingadb/pkg/icingadb/ha.go:166#012runtime.goexit#012#011runtime/asm_amd64.s:1700#012HA exited with an error#012main.run#012#011github.com/icinga/icingadb/cmd/icingadb/main.go:347#012main.main#012#011github.com/icinga/icingadb/cmd/icingadb/main.go:37#012runtime.main#012#011runtime/proc.go:283#012runtime.goexit#012#011runtime/asm_amd64.s:1700
Jun 2 08:51:14 master2 systemd[1]: icingadb.service: Main process exited, code=exited, status=1/FAILURE
Jun 2 08:51:14 master2 systemd[1]: icingadb.service: Failed with result 'exit-code'.
Perhaps a solution might be to configure Restart functionality in systemd? If the root cause is that the mysql has been unavailable for a while? Also this issue seems related: https://github.com/Icinga/icingadb/issues/794
Perhaps a solution might be to configure Restart functionality in systemd?
For automatic systemd unit restarts, there is #958. However, we are not sure if we want to advertise this, since this may hide real persisting issues.
If the root cause is that the mysql has been unavailable for a while? Also this issue seems related: #794
At least some of these reconnection timeouts were addressed with #960 and Icinga/icinga-go-library#131, resulting in Icinga DB trying to reestablish a database server connection if a prior connection was once established. Thus, if your database server just went absence for a few minutes, this should not lead to a crash anymore with the next Icinga DB release 1.4.0.
That message is not shown with journalctl though.
Here I am not quite sure what exactly you are referring to, but certain information is only logged as journald fields, as documented here. Please note, also with the next release, at least the error messages are always logged as part of the main message, not only as journald fields.
ref/NC/866982 active in icingaDB 1.4.0