rodbus disabling ClientChannel without waiting for pending requests

when communicating with one of our modbus server devices via a rodbus client, we noticed that this specific server device is unable to handle multiple modbus connections at once. this should not be an issue.

however, during an active connection via rodbus, rodbus does not seem to register a connection fault when the connection is interrupted due to a third device starting to communicate over modbus with the modbus server.

rodbus only reports response_timeouts as modbus requests wont get answered anymore while there is still a active tcp connection and modbus requests still get acknowledged by the server with a TCP ACK

now we wanted to resolve this issue by reconnecting the rodbus client by disabling and re-enabling the rodbus ClientChannel however this does not work as there are still requests piled up in the rodbus queue which seemed to needed to all be "timed out" before the client channel gets disabled

is there a way to either disable the ClientChannel directly or let all queued up requests fail at once?

also, unfortunately we are unable to change the behaviour of this specific modbus server device

Jul 02 '24 18:07 xlukem

@xlukem Does the server not send a TCP FIN or RST? It just stops answering requests but keeps the connection open?

I agree that Rodbus should be able to enable/disable in this situation. I have a good idea of how this should be implemented on the main task loop.

That said, I wish there was a good way to detect this condition and gracefully handle the poor behavior from this device without the user (you) having to monitor for this condition and initiate an enable / disable. One potential solution would be for the main task loop to implement this logic, i.e. a have a "maximum number of request timeouts" parameter after which the current connection is dropped and a re-connection happens.

Jul 02 '24 18:07 jadamcrain

there actually does seem to be a TCP RST frame.. however the destination port seems weird, i cant find this port again anywhere in the trace but yes, this seems to be an issue with the modbus server we are dealing with, it is generally very poorly designed

IP .204: modbus server IP .1: rodbus IP .44: third device interrupting connection

I agree that Rodbus should be able to enable/disable in this situation. I have a good idea of how this should be implemented on the main task loop.

awesome!

One potential solution would be for the main task loop to implement this logic, i.e. a have a "maximum number of request timeouts" parameter after which the current connection is dropped and a re-connection happens.

yes, thats what we currently try to do by keeping track of failed requests having this implemented by the library would be a nice addition

Jul 02 '24 19:07 xlukem

another thing i have noticed is that rodbus does not report a connectivity problem when the ethernet connection is interrupted

here rodbus only reports single timeouts for each message sent until the requests queue is empty or the modbus server is connected again (and sends an RST) (also, this is another modbus server thats working more reliable than the IP .204)

would it be possible to implement a ClientChannel specific channel timeout?

Jul 03 '24 11:07 xlukem

Not sure I quite understand what you mean by "when the ethernet connection is interrupted". Are you pulling the ethernet cable in this scenario or is there a network failure of sorts? Does Rodbus detect that the connection is down via the channel state callbacks or does it think that the connection is there, but the remote device just isn't responding?

Jul 04 '24 14:07 jadamcrain

yes, i interrupted the connection by pulling the ethernet cable in this scenario rodbus does not report any changes about the ClientState via the PrintingClientStateListener and the request callbacks only report a response_timeout

Jul 04 '24 15:07 xlukem

Thanks for the additional info. I hope to be able to look at this next week.

Jul 05 '24 15:07 jadamcrain

I believe this is the classic problem of detecting a dead socket, i.e. one where the peer disappears without a graceful shutdown.

I'm not surprised that the client would return response_timeout in this situation, nor do I think ClientStateListener would fire an event immediately.

Eventually, the OS would eventually decide that socket is dead for a couple of reasons:

Transmitted data is not being acknowledge. Usually writing to a socket will eventually get the OS to time it out.
The TCP keep-alive kicks in. This shouldn't be the case here since you are writing.

Does the connection time-out eventually, just not in a reasonable time period?

The best solution to this situation might be the same solution we proposed before:

After a configurable number of response timeouts, we can close the connection and force a reconnect.

Jul 07 '24 19:07 jadamcrain

Does the connection time-out eventually, just not in a reasonable time period?

thank you for your advice, after further inspection i have seen that the connection does time out eventually on our production device this timeout seems to be about 15 min on another device i have seen a 1 min timeout, so it does seem to be device specific

After a configurable number of response timeouts, we can close the connection and force a reconnect.

we'd love to see that as part of the library in the future

Jul 09 '24 14:07 xlukem

@xlukem Thanks for confirming the behavior is as expected. We definitely want this feature for the next release.

Jul 09 '24 16:07 jadamcrain