oktopus icon indicating copy to clipboard operation
oktopus copied to clipboard

USP Agents Remain Online After System Shutdown - Community Edition

Open Mark-Millard opened this issue 11 months ago • 3 comments

The Web UI shows devices as Online when, in fact, they are no longer reachable. The agents were:

  • OBUSPA running in Docker container, built from this project. The container was not running.
  • RDK-B Raspberry Pi UspPa agent, running on an RPI 4. The RPI is not turned on and disconnected from the network.
  • RDK-B Sagemcom UspPa agent, running on a proprietary router device. The router is not turned on and disconnected from the network.

All 3 agents were previously registered with the Oktopus Controller. In each case, the Controller Docker containers had been stopped using the deploy/compose/stop.sh script.

Mark-Millard avatar Feb 13 '25 00:02 Mark-Millard

When you kill the oktopus containers all the applications go down, including the ones responsable to handle device status data.

Although you've killed the containers, nats and mongodb data is still saved in your host, so when the applications come up again the devices show up online since it's the data available in mongodb and oktopus applications did not gracefully shutdown when you've runned stop.sh (docker compose down).

If you disconnect the devices while oktopus is running the status of the devices will become offline.

leandrofars avatar Feb 14 '25 11:02 leandrofars

Yes, this is what I'm seeing.

What I would like to see is that if the Controller is running for a period of time (configurable by service operator) and it hasn't heard from a device during that period (like a heartbeat), it assumes that the device is offline. Or perhaps there is another, third state where the device is flagged as being in a zombie state. When the device is heard from again, then it can restore the state to online.

I'm new to USP, so I don't know if the protocol supports this edge condition. Working with ACS and TR-181 data models in context of
cable modems and optical units have all sort of field conditions which will take the device 'offline' and leave the ACS in a confusing state.

Mark-Millard avatar Feb 17 '25 20:02 Mark-Millard

In TR-069 the device status is managed by the timeout of a periodic inform interval the device keeps seeding to the ACS.

In contrast to the CWMP, USP has the TCP connection always open, so you know right in time when a device is disconnected because the TCP connection gets broken.

Still though, there are some cases that you can have a "ghost" TCP, so the connection is lost, but you just don't know it, that's why STOMP has a "heartbeat" mechanism and MQTT has a "keep alive" time for example, that allows those applications protocols to identify a connection is lost.

For these mechanisms to identify a device is not connected anymore, they have to be configured in the controller or in the agent side. You can check those behaviors when using MQTT at Oktopus for example: there will always be Ping-Pong messages after some idle time of interaction between the device and the USP controller. Just make sure your device has a smaller heartbeat interval than the server itself, and it's also possible to inform the interval you wish the device to have through the server, but how to do it depends on each MTP (stomp, websockets, mqtt).

You don't need to make any changes to Oktopus controller configuration to connect your devices, it should work out-of-the-box, but it's good to check those "heartbeat" values if running in scale/production.

leandrofars avatar Feb 18 '25 11:02 leandrofars