rumqtt
rumqtt copied to clipboard
Connection lost to reconnect on Network error is always about 2 minutes?
Hi,
When testing how a listener based on this library reacts to connection losses, I have noticed that I cannot configure the timeout associated to network timeouts. It reconnects fine, but it always takes about 130 seconds as the following log shows:
2021-06-30 15:34:41,744 DEBUG [rumqttc::state] Pingreq,
last incoming packet before 3945 millisecs,
2021-06-30 15:34:46,745 TRACE [mio::poll] deregistering event source from poller
2021-06-30 15:34:46,745 DEBUG [mqtt_logger] MQTT error, will try to reconnect when possible: AwaitPingResp
2021-06-30 15:34:46,745 DEBUG [rustls::anchors] add_pem_file processed 1 valid and 0 invalid certs
2021-06-30 15:36:56,931 TRACE [mio::poll] deregistering event source from poller
2021-06-30 15:36:56,931 DEBUG [mqtt_logger] Network error, will try to reconnect when possible: Io(Os { code: 110, kind: TimedOut, message: "Connection timed out" })
Is there a way to reduce this timeout?
For example using the tool mosquitto_sub
reconnects after about 10 seconds.
After more testing it seems like the delay is (mqttoptions.set_keep_alive
+ 1min) * 2
Can I know how you are simulating disconnections and what's the keep alive you are using? If you are disconnecting your wifi or pulling your ethernet cable, tcp will just assume that this is congestion. Rumqttc detecting idleness in keep alive
time will trigger ping and waits for ping response. Before sending next ping, if previous ping hasn't received a response, client will assume that this is a half open connection and then trigger reconnection. So this typically takes upto 2 keep alive times.
Hi @tekjar thanks for you response!
I have set mqttoptions.set_keep_alive(5)
, and I have tested this by connecting and disconnecting from my network using a VPN.
The 2 keep alive
times you refer to are within the time, but this extra minute comes from something else.
I cloned rumqttc
and played around a bit, and it seems like the TcpStream
calls via tokio
need 1 minute to timeout.
If you have ideas, I'm happy to give them a try!
As we're not seeing the issue with mosquitto_sub
I think there is some case being missed right now.
Let me try this as well. May I know which broker you are using?
I'm testing against mosquitto
from the Arch repositories (systemctl start mosquitto
).
You need to disable the VPN about 10s-20s, if one does it faster this effect does not happen.
If you want me to log something, or screen capture, or ... just give me a shout!
Sure! I'll let you know if I'm not able to reproduce this
Has any progress been made on this issue? We're still seeing the same problem.
Its similar, the 1minute timout is gone, so I assume that the ping responses are not handled in time? Or that they do not get time to get sent? We only see the issue when we have quite a high load on the MQTT client.
Unfortunately I couldn't find time to RCA this
I think I managed to mitigate this issue by reducing the time spent in handling the incoming Publishes.
I immediately send the incoming publishes over a channel to another thread, suggesting something is not 100% with the registering of the ping responses maybe?