icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

Satellite has problems with expired CRL

Open xschlef opened this issue 3 years ago • 10 comments

Describe the bug

A very simple configuration of an icinga2 satellite was unable to connect to our icinga2-master, because the CRL has expired. The daemon was not reloaded for 30 days, which is our maximum CRL age. This is basically the same issue we faced with #8501.

The main purpose of this satellite is to check master reachability and health. We update our CRL every 6 hours and a restart/reload fixes the issue. So I think, that a running daemon is not correctly reloading the changed CRL. We only face this issue with this icinga2 instance! Other hosts are reloading correctly, but are running a more complex configuration.

Aug 06 08:39:39 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:39:49 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired
Aug 06 08:39:59 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired
Aug 06 08:39:59 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:39:59 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:40:09 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired
Aug 06 08:40:19 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:40:19 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired
Aug 06 08:40:19 icinga2-satellite icinga2[701]: API client disconnected for identity 'icinga2-master'
Aug 06 08:40:29 icinga2-satellite icinga2[701]: Certificate validation failed for endpoint 'icinga2-master': code 12: CRL has expired

To Reproduce

Start icinga2, wait until CRL expiration and connections start to fail if the master drops the connection, because of config reloads.

# zones.conf
object Endpoint "icinga2-master" {
        host = "icinga2-master"
        port = "5665"
}
object Zone "master" {
        endpoints = [ "icinga2-master" ]
}
object Endpoint "icinga2-satellite" {

}

object Zone "icinga2-satellite" {
        endpoints = [ "icinga2-satellite" ]
        parent = "master"
}

Expected behavior

The daemon periodically reloads the CRL or monitors the CRL for changes.

Your Environment

icinga2 - The Icinga 2 network monitoring daemon (version: r2.13.4-1)

System information: Platform: Debian GNU/Linux Platform version: 10 (buster) Kernel: Linux Kernel version: 4.19.0-21-amd64 Architecture: x86_64

Enabled features: api checker command mainlog notification syslog

Config validation:

[2022-08-15 14:48:59 +0200] information/cli: Icinga application loader (version: r2.13.4-1)
[2022-08-15 14:48:59 +0200] information/cli: Loading configuration file(s).
[2022-08-15 14:48:59 +0200] information/ConfigItem: Committing config item(s).
[2022-08-15 14:48:59 +0200] information/ApiListener: My API identity: icinga2-satellite
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 2 Notifications.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 IcingaApplication.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 Host.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 EventCommand.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 FileLogger.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 SyslogLogger.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 CheckerComponent.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 2 Zones.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 2 Endpoints.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 ApiListener.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 NotificationComponent.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 159 CheckCommands.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 9 UserGroups.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 7 TimePeriods.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 28 Users.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 1 Service.
[2022-08-15 14:48:59 +0200] information/ConfigItem: Instantiated 7 NotificationCommands.
[2022-08-15 14:48:59 +0200] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2022-08-15 14:48:59 +0200] information/cli: Finished validating the configuration file(s).

Additional context

We are rolling out our own certificate infrastructure and are not relying on icinga2 pki.

xschlef avatar Aug 15 '22 13:08 xschlef

Hello @xschlef!

icinga2 satellite was unable to connect to our icinga2-master

Have you tried to configure both connection directions?

master -> sat master <- sat

Best, A/K

Al2Klimov avatar Aug 16 '22 12:08 Al2Klimov

refs #8515

Al2Klimov avatar Aug 16 '22 12:08 Al2Klimov

Hi,

the problem is the connection sat -> master. Our master does not initiate any connections. To be honest, I have no idea why only this instance is having problems with CRL expiry. All other agents are working fine and are reloading their CRL.

Thanks!

xschlef avatar Aug 16 '22 12:08 xschlef

Do all other agents get not reloaded for 30 days?

Al2Klimov avatar Aug 16 '22 13:08 Al2Klimov

exactly. We have agents running for 90+ days without any issues, not reloaded only loosing the connection to the master because of config reloads. This is the only server that is facing this issue.

But you made me check twice. We are just setting the CRL for this sat and the master. All other agents do not use the crl (we really should fix that). My guess is, that the issues will show up for all our agents if I set the crl correctly...

The features-enabled/api.conf for sat and master is the following:

object ApiListener "api" {
  bind_host = "::"
  accept_commands = true
  accept_config = true
  crl_path = "/etc/ssl/crl/server-ca-2017.0.crl.pem"
}

xschlef avatar Aug 16 '22 14:08 xschlef

Our master does not initiate any connections.

Just because it's configured so or due to network design?

Al2Klimov avatar Aug 16 '22 14:08 Al2Klimov

Our master does not initiate any connections.

Just because it's configured so or due to network design?

network design. it makes firewall rules a lot easier...

xschlef avatar Aug 16 '22 14:08 xschlef

If you replace the CRL file with a newer version, there is code to update it:

https://github.com/Icinga/icinga2/blob/7d64fbf8f6245d811df765674b2c9d876ca62597/lib/remote/apilistener.cpp#L485-L493

If that doesn't work, that's a bug for sure. However, keep in mind that Icinga doesn't attempt to download a CRL file if there's a URL specified in the certificate.

julianbrost avatar Aug 16 '22 21:08 julianbrost

The crl is automatically downloaded to the local fs every 6 hours via a fetch-crl cron on every server including the sat and master.

Current state: -rw-r--r-- 1 root root 18114 Aug 17 06:44 /etc/ssl/crl/server-ca-2017.0.crl.pem

xschlef avatar Aug 17 '22 09:08 xschlef

So your satellite never receives any incoming connections? The issue probably is that the CRL is only updated once per accept(), therefore, just requesting https://localhost:5665/ periodically should work as a workaround.

But yes, the CRL should also be updated when performing outgoing connections as well.

julianbrost avatar Aug 19 '22 09:08 julianbrost