rtkbase icon indicating copy to clipboard operation
rtkbase copied to clipboard

Performance Bug: RTKBase Web Service Creates D-Bus Message Flood (400+ messages/sec)

Open kiczko opened this issue 5 months ago • 2 comments

Summary

The RTKBase web service (rtkbase_web.service) creates an excessive D-Bus message flood that can overwhelm system resources on Raspberry Pi devices, causing high CPU usage and system instability.

Environment

Device: Raspberry Pi 2 Model B OS: Raspberry Pi OS (Debian 12 bookworm) RTKBase Version: 2.6.3 Python Version: 3.11.2 D-Bus Message Bus Daemon Version: 1.14.10 Affected Component: web_app/server.py and web_app/ServiceController.py

Problem Description

The RTKBase web application polls systemd service status every second using pystemd, creating thousands of new D-Bus connections per second. This causes:

1,000-3,000+ actual D-Bus messages per second (normal is 10-50/sec) 15,000-20,000+ lines of D-Bus output per second (due to multi-line message structure) High CPU usage from dbus-daemon (40%+) and polkitd (10%+) System performance degradation on resource-constrained devices Potential system instability due to D-Bus bus saturation

Evidence

Quantitative Analysis sudo timeout 10 dbus-monitor --system | grep -E '^(method call|method return|signal|error)' | wc -l

Qualitative Analysis The D-Bus flood shows a clear pattern:

  1. Rapid connection creation: Connection IDs increment from :1.4533994 to :1.4578224 in seconds
  2. Repetitive service queries: Same systemd units queried continuously
  3. UnitNew/UnitRemoved cycles: Services created and immediately destroyed

The attached script dbus_analysis.sh runs a comprehensive analysis.

System Impact

# htop showing high CPU usage
PID  USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
358  messagebus 20   0  8608  3936  3424 S 41.6  0.4  19:41.25 dbus-daemon --system
366  polkitd    20   0 48120  7624  6512 S 11.7  0.7   4:50.54 polkitd --no-debug

Root Cause Analysis

Primary Issue: Inefficient D-Bus Usage in ServiceController.py The ServiceController class creates new D-Bus connections for every query:

# Current problematic code
class ServiceController(object):
    def __init__(self, unit):
        self.unit = Unit(bytes(unit, 'utf-8'), _autoload=True)  # New connection each time
    
    def isActive(self):
        # Creates new D-Bus connection
        if self.unit.Unit.ActiveState == b'active':
            return True

Secondary Issue: Excessive Polling in server.py The manager() function polls services every second:

# Current problematic code in manager()
while True:
    if connected_clients > 0:
        updated_services_status = getServicesStatus(emit_pingback=False)  # Every 1 second
        # ...
    time.sleep(1)

Why This Creates a D-Bus Flood

  1. 11 services × 3 D-Bus calls per service (isActive, status, get_result) × every second = 33+ new connections/sec
  2. pystemd library doesn't reuse connections efficiently
  3. Each connection requires multiple D-Bus messages (Hello, NameAcquired, property queries, NameLost)
  4. Broken / misconfigured services cause additional UnitNew/UnitRemoved cycles

dbus_analysis.sh.txt

kiczko avatar Jul 21 '25 23:07 kiczko

Proposed Solution

1. Implement Connection Reuse and Caching in ServiceController.py Replace the current ServiceController.py with an improved version that:

  • Reuses D-Bus connections via shared Manager instance
  • Caches service status for 5 seconds to eliminate redundant queries
  • Handles errors gracefully with safe defaults

2. Reduce Polling Frequency in server.py Modify the manager() function to:

  • Check services every 10 seconds instead of every second
  • Send system info every 2 seconds (separate from service checks)
  • Clear cache after service state changes

3. Add Better Error Handling

  • Graceful fallbacks for problematic services
  • Proper exception handling in service queries
  • Safe defaults when services are unavailable

Files to Modify:

  1. web_app/ServiceController.py - Complete rewrite with connection pooling
  2. web_app/server.py - Update manager() and getServicesStatus() functions

Will create a corresponding pull request.

kiczko avatar Jul 21 '25 23:07 kiczko

Thank you for this in depth report.

PR is welcome.

2. Reduce Polling Frequency in server.py Modify the manager() function to:

* Check services every 10 seconds instead of every second

But, I will keep the service check every second, because I don't want to confuse the end user. And don't forget that these calls with pystemd are stopped when they are no user connected to the web interface.

Stefal avatar Jul 23 '25 19:07 Stefal