thin-edge.io icon indicating copy to clipboard operation
thin-edge.io copied to clipboard

Add keepalive_interval as a persistent configuration option for c8y_bridge.conf

Open jmoo900 opened this issue 1 year ago • 3 comments

Is your feature improvement request related to a problem? Please describe. Working with a customer that is utilizing cellular devices. They have strict bandwidth requirements where even the frequency of keep-alive messages can impact the negotiated contract with their cellular provider. The proposed option would allow them to limit the number of MQTT keep alive messages being sent by their device and ultimately the amount of bandwidth that passes through the cellular connection. This could impact any device that is utilizing a cellular connection.

Describe the solution you'd like Would like the option to either be set through the tedge cli and saved as a tedge configuration option. Could also be a persistent entry in the c8y_bridge.conf file. I have found that when I manually add it to the configuration file and restart Mosquitto service through sysctl, I can see the behavior I want. The problem with this is that the configuration file gets recreated when a tedge reconnect c8y command is run and the change needs to be reapplied.

Describe alternatives you've considered The workaround that we are investigating in the short term is scripting the modification of the bridge configuration file and then restarting the Mosquitto service. This is a bit cumbersome and technically may not be feasible given the additional start up latency that it will introduce. We have also considered creating a new configuration file and storing it in the /etc/mosquitto/conf.d directory (this is already referenced in the mosquitto.conf file, similar to /etc/tedge/mosquitto_conf) . I think this fails because the bridge seems to be created by some logic in thin-edge where it builds out the connection and the keepalive_interval needs to be in the context of that connection. Because it is outside of that in this config file we have created, it prevents Thin-Edge from starting.

Additional context I don't think there are any screenshots that are needed. I'm happy to jump on a call and show any of things I mentioned above if that is helpful.

jmoo900 avatar Oct 03 '24 14:10 jmoo900

@jmoo900 It makes sense to have the mqtt keep alive interval configurable, however playing nicely with the cellular network is only half of the picture, the keep alive interval would have also be compatible with the Cumulocity IoT MQTT broker, as I believe there is an upper limit here before the platform thinks the connection is dead (but maybe I'm wrong here).

Have you already experimented with the some ping intervals against Cumulocity IoT and have any idea of what upper limit would be sustainable?

reubenmiller avatar Oct 03 '24 15:10 reubenmiller

Hey @reubenmiller, You bring up a good point. I have really only been focused on the device side of the equation. The configurations that I have been trying have been in the 5 minute range and based on some wireshark outputs, it appears to be working as expected. I haven't noticed any MQTT connection issues with my thin-edge. If I remember correctly, the customer that I am working with was looking at around 15 minute interval for the MQTT keep alive. On the Cumulocity side, there is the concept of a "required interval" that allows you to set the expected communication frequency of a device, but I think that just drives the availability/online status of the device. I will have to do some digging on the actual connection to see if there are any timing limitations for the MQTT broker.

jmoo900 avatar Oct 03 '24 15:10 jmoo900

On the Cumulocity side, there is the concept of a "required interval" that allows you to set the expected communication frequency of a device, but I think that just drives the availability/online status of the device.

Yes, you are correct. The "required interval" has no influence on the MQTT level (as the feature works for any kind of device, not just MQTT devices).

I will have to do some digging on the actual connection to see if there are any timing limitations for the MQTT broker.

That would be a great contribution 👍 But we can also help if you need some guidance with anything.

reubenmiller avatar Oct 03 '24 17:10 reubenmiller

@jmoo900 Did you have a chance to find out what an upper bound the MQTT keepalive would be accepted by Cumulocity?

reubenmiller avatar Dec 05 '24 09:12 reubenmiller

I reached out to the data in motion team to try to get some clarity on this. Specifically asked them if there were any limitations on what an MQTT keep alive could be set to. Initial feedback was:

"@Jake Moody what do you mean by the MQTT timeout interval? In core mqtt by default we have 10 seconds time set for the connect, meaning after the TCP connection is established client must send connect message within those 10 seconds. In the connect message there is a keep alive interval that doesn't have any restriction, it has to non-negative value and then server uses this and sets the timeout to 1.5 of this value"

I tried to futher clarify if there could be situations where the TCP and MQTT keep alive were out of sync based on what was set here and received the following:

"I see, so if the TCP connection will be closed, for example by OnTopLb, server should also terminate it after reaching keepAlive*1.5 as it won't be receiving the PING message within this time. Client in such case should also notice that the connection is closed and it should try to reconnect, but I would say that it really depends from the client implementation"

So my limited technical take on this is that as long as its a positive numerical value, Cumulocity will consider the MQTT keep alive to be 1.5 times this.

jmoo900 avatar Dec 09 '24 21:12 jmoo900

Did some basic testing on my thin-edge device. Set the keepalive_interval setting to 5 minutes and monitored the bridge health over a span of 2 hours. Seems like the bridge health stayed at 1 for the duration. I also ran a test with an MQTT keep alive set to 10 minutes and monitored the bridge health over a span of around 2 hours. Noticed some differences with this second test. Saw the bridge health value go to 0 before being set back to 1 within a second or so. Not sure exactly what to make of this. Should I proceed with higher intervals?

2024-12-10 15_46_33- 2024-12-10 15_47_14-

jmoo900 avatar Dec 10 '24 20:12 jmoo900

@jmoo900 Thanks for taking the time to have a look.

After some talks with Cumulocity RnD and some of the cloud instances have some additional network settings which will affect the MQTT connection when the MQTT keep-alive is set to anything above 10 mins (or in some cases ~2/3 minutes). However the good news is that in the future Cumulocity should be able to support higher MQTT keepalive values (I've tested it on test server with a value of 3540s (as the load balancer timeout had been increased from 10min -> 60min).

For those who are interested, the instructions to manually change the MQTT keep-alive value (when using the mosquitto bridge) can be viewed here: https://gist.github.com/reubenmiller/82b0224b1dc23015262e0cf2d6a089ed

reubenmiller avatar Jan 16 '25 09:01 reubenmiller

#3365 introduced the new tedge configuration settings to set custom keepalive interval per cloud bridge. For example, if you want to configure keepalive interval to 1h for the c8y bridge, run:

sudo tedge config set c8y.bridge.keepalive_interval 1h

The default value is 60s. After changing the config, tedge reconnect is necessary to apply the change on the bridge configuration.

sudo tedge reconnect c8y 

If you are using mosquitto as bridge, you can confirm that the /etc/tedge/mosquitto-conf/c8y-bridge.conf contains keepalive_interval.

### Bridge
....
keepalive_interval 3600

rina23q avatar Jan 29 '25 14:01 rina23q

The feature is available from the main branch, e.g.

wget -O - thin-edge.io/install.sh | sh -s -- --channel main

Otherwise, it will be available in the next official release which will be 1.5.0.

reubenmiller avatar Jan 29 '25 15:01 reubenmiller