icinga2 icon indicating copy to clipboard operation
icinga2 copied to clipboard

JsonRpcConnection#Send*(): discard messages ASAP once shutting down

Open Al2Klimov opened this issue 1 year ago • 10 comments

Especially ApiListener#ReplayLog() enqueued lots of messages into JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.

fixes #9985

Al2Klimov avatar Feb 07 '24 13:02 Al2Klimov

Now #Disconnect() takes effect ASAP.

Which - with the current state of the PR - means that the messages are discarded instead of enqueued.

Especially ApiListener#ReplayLog() enqueued lots of messages into JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if the connection was shut(ting) down.

However, ApiListener#ReplayLog() will still generate messages for a dead connection that will be discarded anyways as it has no way to tell that the connection is dead. How much effort would it take to improve this?

julianbrost avatar Feb 07 '24 15:02 julianbrost


[2024-02-09 17:55:57 +0100] information/ApiListener: Sending config updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: Received certificate request for CN 'master2' signed by our CA.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing configuration files for global zone 'global-templates' to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] information/JsonRpcConnection: The certificates for CN 'master2' and its root CA are valid and uptodate. Skipping automated renewal.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending config file updates for endpoint 'master2' in zone 'master'.
[2024-02-09 17:55:57 +0100] information/ApiListener: Syncing runtime objects to endpoint 'master2'.
[2024-02-09 17:55:57 +0100] warning/JsonRpcConnection: API client disconnected for identity 'master2'
[2024-02-09 17:55:57 +0100] information/ApiListener: Endpoint 'master2' disconnected while syncing runtime objects.
[2024-02-09 17:55:57 +0100] warning/ApiListener: Removing API client for endpoint 'master2'. 0 API clients left.
[2024-02-09 17:55:57 +0100] information/ApiListener: Finished sending runtime config updates for endpoint 'master2' in zone 'master'.

yhabteab avatar Feb 09 '24 17:02 yhabteab

@yhabteab When should those new log messages you added appear? At least I didn't see them when simply restarting individual nodes in the cluster. Judging just from the message, they should probably have severity warning, I'm wondering though when/how often they might appear (so if this is something that can regularly appear during a restart, the severity probably shouldn't be too high).

julianbrost avatar Feb 22 '24 13:02 julianbrost

@yhabteab When should those new log messages you added appear?

The first one is triggered by the timer handler, which gets scheduled every 5 seconds and is trying to send the SetLogPosition message. The second one actually applies to every communication attempt initiated with ApiListener::Relay* or directly with ApiListener::SyncSendMessage(), which can be quite a lot but also quite hard to trigger them manually.

yhabteab avatar Feb 22 '24 14:02 yhabteab

So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely?

julianbrost avatar Feb 22 '24 14:02 julianbrost

So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely?

Yes, depending on the number of disconnected endpoints and on how busy m_RelayQueue was, it could result into a bunch of warnings.

yhabteab avatar Feb 22 '24 14:02 yhabteab

So it's possible that those messages appear during a normal reload depending on the timing, but it's rather unlikely?

Yes, depending on the number of disconnected endpoints and on how busy m_RelayQueue was, it could result into a bunch of warnings.

Actually, the endpoint is dropped off the list as soon as it is disconnected, so there will be no attempts via the ApiListener::Relay*() methods after that.

yhabteab avatar Feb 22 '24 14:02 yhabteab

The implementation should be fine. Main remaining question for me is the log level of the new messages, like should they be downgraded to notice (if that's something that can just legitimately happen during a disconnect) or maybe promoted to warning (if that's actually something problematic).

@Al2Klimov: please also review the current state of this PR.

julianbrost avatar Feb 22 '24 16:02 julianbrost