ice Implement connection inactivity timeout in C++

This PR implements the connection inactivity timeout in C++.

A connection is inactive (as opposed to idle) when it's active (a confusing terminology, see below) and has no outstanding invocation or dispatch.

This PR adds a connection _dispatchCount to count dispatches. It's reasonably easy to manage. I originally planned to include incoming heartbeats in this dispatch count when the heartbeat callback is not null, but did not. Sending or receiving a heartbeat has no effect on the inactivity timeout.

For "invocation count", I am only counting two-way invocations (the size of the _asyncRequests map). Sending any invocation cancels the inactivity timeout timer task. For oneway invocations, we consider the invocation complete as soon as it's queued--we don't wait until it's actually sent. It would be nicer to wait until it's sent but the code is too complicated.

active vs inactive

Rather than use the term "inactive", this PR uses "at rest". A connection is at rest when it's active and has no outstanding invocation or dispatch.

Apr 29 '24 14:04 bernardnormier

It looks like a connection that started reading a message (but didn't fully read the payload) is considered inactive.

This is both accurate and the correct behavior. Merely receiving a protocol frame does not make a connection "not at rest". This frame must be an (incoming) request for a connection to transition from "at rest" to "not at rest" - and cancel the inactivity timer task.

I think having this check at the end of "case requestMsg:" and before we dispatch the request is the right spot. Where do you suggest to put this check for incoming requests?

And as you pointed out, it connection which is writing a oneway request will also be considered inactive.

This is not that bad. When we send the oneway request, we cancel the inactivity timer task. So the sending of this oneway request has to take over inactivity timeout (by default 5min) for the connection to be shutdown by the inactivity timeout.

Actually, I think it's worse: a connection which is writing a response is also considered inactive.

Same here: the worst case scenario is we schedule the inactivity timer task right after writing the response. So with the default config, the sending of the response has to take over 5min for the connection to be shutdown by the inactivity timeout.

As a result, the server will close the connection in these cases. I don't think it's good idea and it doesn't match IceRPC's behavior.

I would also prefer to wait under after a oneway request is sent / a response it sent to "decrement" the "dispatch+invocation" count. It's just much harder to do with Ice, hence this compromise.

I would look into checking for _readHeader == false (if no header is read, there's no pending message for read) and _writeStream.b.empty() (if the write buffer is empty, there's no message being written).

What about you fix this behavior in a follow-up PR?

However, I believe your idea is a non-starter. Heartbeats should have no effect on the inactivity timeout, and _readHeader / _writeStream.b.empty() are most likely affected by heartbeats.

It also doesn't look like the timer is scheduled after successful connection establishment (a connection can be established without sending a request by using proxy->ice_getConnection()) .

The timer is scheduled when the connection becomes Active. This has nothing to do with connection establishment, since a just established connection is in the Holding state.

I think it's important for the inactivity timeout to have an effect only when the connection state is Active (as per this PR); my understanding is we can't perform a graceful shutdown for a connection in the Holding state and the inactivity timeout is all about graceful shutdown.

Apr 30 '24 13:04 bernardnormier

We already created a new exception so I would add a new exception instead of re-use this exception which is a protocol exception.

How many exceptions do you want to define for a gracefully closed connection?

We already have:

ConnectionCloseException that's what you get when the peer initiated the graceful shutdown
ConnectionManuallyClosedException with graceful = true that's when you call close on your Connection

In IceRPC, we have IceRpcError.ConnectionClosedByPeer (=ConnectionCloseException) and IceRpcError.ConnectionAborted.

Apr 30 '24 14:04 bernardnormier

We already created a new exception so I would add a new exception instead of re-use this exception which is a protocol exception.

I added a new exception, ConnectionClosedException, which takes a message. It's currently C++-only but needs to be ported to other languages (in a follow-up PR).

I think we should rework these exceptions in a follow-up PR. For example:

CloseConnectionException remains as-is and derives from ProtocolException. It's named after the CloseConnection ice protocol frame and corresponds to the ConnectionClosedByPeer error code in IceRPC.
ConnectionClosedException: graceful closure with a message
ConnectionAbortedException: non-graceful closure with a message

And we remove ConnectionIdleException (replaced by ConnectionClosedException) and ConnectionManuallyClosedException (replaced by ConnectionClosedException / ConnectionAbortedException).

May 01 '24 13:05 bernardnormier

This is both accurate and the correct behavior. Merely receiving a protocol frame does not make a connection "not at rest". This frame must be an (incoming) request for a connection to transition from "at rest" to "not at rest" - and cancel the inactivity timer task.

Correct me if I'm wrong but this is the case for the IceRPC ice's protocol implementation but not for the icerpc protocol implementation. In the ice protocol implementation we actually schedule the inactive timer task only when done with dispatching. It doesn't look like it's rescheduled upon fully reading a reply. Sounds like a bug?

I think having this check at the end of "case requestMsg:" and before we dispatch the request is the right spot. Where do you suggest to put this check for incoming requests?

Nothing to suggest because I think it's a mistake to have a specific behavior for request messages.

So with the default config, the sending of the response has to take over 5min for the connection to be shutdown by the inactivity timeout.

And with a non default config where the application wants to close inactive connections quickly? How do you prevent the connection from being closed while there are several large responses or large oneway requests being queued for sending?

However, I believe your idea is a non-starter. Heartbeats should have no effect on the inactivity timeout, and _readHeader / _writeStream.b.empty() are most likely affected by heartbeats.

Good point. Let's suppose we can exclude the sending/receive of a heartbeat from the inactivity check. Do you agree that we should only consider the connection inactive if there's no data being received/sent and no invocations/dispatches in progress? Wouldn't this match IceRPC's icerpc protocol implementation?

The connection validation message has no payload and should always be read/sent in one shot (very unlikely that a transport sends 13 bytes in several steps!). So I think it's probably doable:

we cancel the inactive timer as soon as we start reading a message which is not a connection validation message
we cancel the inactive timer as soon as we send a message which is not a connection validation message
we reschedule the timer once all these conditions are true:
- _writeStream.b.empty() == true, there's no more data to send
- _readHeader == true, there's no more message to read and we are waiting for a new message
- (invocations + dipatches count) == 0, no more invocations or dispatches in progress

May 03 '24 11:05 bentoi

Correct me if I'm wrong but this is the case for the IceRPC ice's protocol implementation but not for the icerpc protocol implementation. In the ice protocol implementation we actually schedule the inactive timer task only when done with dispatching. It doesn't look like it's rescheduled upon fully reading a reply. Sounds like a bug?

The IceRPC code is much simpler. In IceProtocolConnection.cs, we call DecrementDispatchInvocationCount when an invocation is complete (e.g. after reading the reply): https://github.com/icerpc/icerpc-csharp/blob/57995b702436169798fc3fe5fd513e6f09591397/src/IceRpc/Internal/IceProtocolConnection.cs#L453

This method reschedules the inactivity timeout timer when the count reaches 0: https://github.com/icerpc/icerpc-csharp/blob/57995b702436169798fc3fe5fd513e6f09591397/src/IceRpc/Internal/IceProtocolConnection.cs#L987

I don't see any bug here.

Good point. Let's suppose we can exclude the sending/receive of a heartbeat from the inactivity check. Do you agree that we should only consider the connection inactive if there's no data being received/sent and no invocations/dispatches in progress? Wouldn't this match IceRPC's icerpc protocol implementation?

Yes, I agree. However, figuring out when a request/response is actually sent is difficult with the Ice code. That's why I didn't do it.

The connection validation message has no payload and should always be read/sent in one shot (very unlikely that a transport sends 13 bytes in several steps!).

You meant 14 bytes?

we reschedule the timer once all these conditions are true:

_writeStream.b.empty() == true, there's no more data to send

That's because when we send a heartbeat - and just a heartbeat - the sending is synchronous and _writeStream.b.empty() can't be true? In other words, _writeStream.b.empty() == true means we're sending asynchronously something else than a heartbeat?

_readHeader == true, there's no more message to read and we are waiting for a new message

(invocations + dipatches count) == 0, no more invocations or dispatches in progress

where invocations = _asyncRequests.size()? If not, how do you count invocations in progress?

May 03 '24 14:05 bernardnormier

we reschedule the timer once all these conditions are true: _writeStream.b.empty() == true, there's no more data to send _readHeader == true, there's no more message to read and we are waiting for a new message (invocations + dipatches count) == 0, no more invocations or dispatches in progress

With this proposal, how do we get notified this condition may have become true and it's now time to schedule the inactivity timer task? I'd rather not have a background thread that checks for this condition from time to time.

In my current code, the trigger is when _dispatchCount is decremented (and therefore can reach 0) and a reply is received (and therefore _asyncRequests can become empty).

May 03 '24 15:05 bernardnormier

Nothing to suggest because I think it's a mistake to have a specific behavior for request messages.

There are not many types of protocol frames with Ice: request/batch request, reply, validate-connection and close-connection.

The only type of protocol frame that upon reception triggers a transition from at-rest to not-at-rest is request/batch request. If you're waiting for a reply, you're not at rest and the inactivity timer task is not scheduled.

we cancel the inactive timer as soon as we start reading a message which is not a connection validation message

Why do you care about replies here? If you're expecting a reply, you're not at rest. If you get a reply for an invocation you discarded a while ago (because it timed out), I'd rather keep the connection "at rest" and not cancel the inactivity timer task.

we cancel the inactive timer as soon as we send a message which is not a connection validation message

Again, why do you care about reply here? If you're sending a reply, by definition you're not "at rest", so there is nothing to cancel. You transitioned out of the at rest state when you received the request. The only interesting part for replies is when it's fully sent - that's when you want to schedule (not cancel) the inactivity timer task.

May 03 '24 15:05 bernardnormier

The IceRPC code is much simpler. In IceProtocolConnection.cs, we call DecrementDispatchInvocationCount when an invocation is complete (e.g. after reading the reply):

Ok, it's not obvious from reading the code that the full frame is already read when the response task completion source is completed by the read frame loop.

Otherwise, I'm looking at writing a patch to show how to implement what I tried to describe.

May 06 '24 13:05 bentoi

@bentoi will open a PR to improve this implementation.

May 09 '24 13:05 bernardnormier