openwisp-radius
openwisp-radius copied to clipboard
[feature] Added intergration with openwisp-monitoring
Blockers
- [ ] https://github.com/openwisp/openwisp-monitoring/pull/538
Closes https://github.com/openwisp/openwisp-monitoring/issues/532 Closes https://github.com/openwisp/openwisp-monitoring/issues/534
coverage: 98.678% (+0.001%) from 98.677% when pulling 1bdf7b6cec59fde910bc7db909a69ac0deca2641 on radius-monitoring into 6a2d98861a8d43e7622d87f7e1af97503c2677f9 on master.
We found these shortcomings during the review
- Write total number of users registered on the systems to influxdb once a day. No need to write the data point on every user registration.
- Change the colors in the charts.
Summary of Changes:
The current implementation faced a challenge wherein the charts for "User Registration" and "Total User Registration" started from zero, displaying the number of users registered since the feature deployment rather than the total count of users registered on the system.
To address this, the proposed solution was to write the total number of registered users to InfluxDB once a day, eliminating the need for a data point write on every user registration event.
Solution Exploration:
In the latest iteration, I delved into the feasibility of the proposed solution. Initial investigations revealed a necessity to write metric points every hour instead of once a day, as the chart resolution for "last 7 days" draws a point for each hour.
Implementation Details:
To accommodate this, the following logic was implemented in Python:
def write_user_registration_metrics():
now = timezone.now()
metric_data = []
# Get the registration data for the past hour.
# The query returns a tuple of organization_id, registration_method and
# count of users who registered with that organization and method.
registered_users = (
RegisteredUser.objects.filter(
user__openwisp_users_organizationuser__created__lte=now,
user__openwisp_users_organizationuser__created__gt=now
- timezone.timedelta(hours=1),
)
.values_list('user__openwisp_users_organizationuser__organization_id', 'method')
.annotate(count=Count('user', distinct=True))
)
for org_id, registration_method, count in registered_users:
registration_method = clean_registration_method(registration_method)
metric, _ = Metric._get_or_create(
configuration='user_signups_new',
name='User SignUps',
key='user_signups_new',
object_id=None,
content_type=None,
extra_tags={
'organization_id': str(org_id),
'method': registration_method,
},
)
metric_data.append((metric, {'value': count, 'time': now}))
Metric.batch_write(metric_data)
Caveats Identified:
However, two caveats were encountered:
- The logic wrote a metric point for each organization when a user registered with multiple organizations, causing the total user count to be misrepresented.
- Users without a corresponding RegisteredUser object were not considered, such as manually created superusers, further contributing to data inconsistency.
Decision and Refinement:
After discussion with @nemesifier, it was concluded that the current implementation, which allows a DISTINCT operation on user IDs, resolves the first caveat effectively.
To address the second caveat, it was decided to update the logic to write a metric point for all users created. For users without a RegisteredUser object, the metric point will be assigned an "unspecified" method.
Resolution:
This adjustment resolves the mismatch between the total user count displayed on charts and the user's changelist page.
Furthermore, to rectify the initial problem, a celery task will be added to retroactively write metric points for old data from the database. This celery task would get triggered on the post_migration signal, thus aligning the implementation with the desired outcome.
I introduced a task to retroactively write RADIUS metrics from the database to InfluxDB. This task aimed to populate InfluxDB with historical data, ensuring comprehensive metric tracking.
Task Implementation:
The task, outlined below, iterates through user data to extract relevant metrics:
@shared_task
def post_migrate_retroactive_write():
"""
This task retroactively writes the RADIUS metrics
to InfluxDB using the data present in the database.
"""
metric_data = []
for user in (
User.objects.select_related('registered_user')
.only('id', 'registered_user__method')
.iterator()
):
try:
registration_method = user.registered_user.method
except RegisteredUser.DoesNotExist:
registration_method = 'unspecified'
registration_method = clean_registration_method(registration_method)
org_query = list(
user.openwisp_users_organizationuser.values_list(
'organization_id',
'created',
)
)
if not org_query:
# This user is not a member of any organization.
# Use "None" for organization_id for this user.
org_query = [(None, user.date_joined)]
for (org_id, time) in org_query:
metric = _get_user_signup_metric(
organization_id=str(org_id), registration_method=registration_method
)
metric_data.append(
# (metric, {'value': sha1_hash(str(user.id)), 'time': time})
(metric, {'value': user.username, 'time': time})
)
Metric.batch_write(metric_data)
Encountered Caveats:
During testing with real-world data, two significant challenges were identified:
Retention Policy Limitations: InfluxDB's retention policy imposes constraints on storing data beyond defined limits. Consequently, storing metrics from the installation's inception becomes infeasible. Moreover, InfluxDB's continuous deletion of older data introduces inconsistency in total user values over time.
Chart Display Limitations: The chart displays totals based on the selected time range exclusively. For instance, if there are a total of 60 users – 59 registered within the last year and one registered within the last two years – the chart would only depict 59 users when the time range is set to "last 365 days."
Resolution Considerations:
In response to the encountered caveats, a proposal was made to implement a separate retention period for this data. While this would address the first caveat, the second remains a concern.
@nemesifier suggested reverting to the previous implementation, where the number of users registered with each method for each organization is recorded in InfluxDB every hour. To resolve the caveats with this approach:
- The logic wrote a metric point for each organization when a user registered with multiple organizations, causing the total user count to be misrepresented.
The total count of registered users will be recorded in InfluxDB with a special value (e.g., __all__) for the organization_id field. When superusers view the charts, the metrics will be filtered using this value, allowing them to see the total registered user count. Additionally, the number of registered users for each organization will be recorded separately, ensuring accurate counts when the chart is filtered by an organization or viewed by non-superusers.
We would write the number of registered users for each organization separately. This values would be used when the chart is filtered by an organization or when a non-superuser is viewing the chart.
- Users without a corresponding RegisteredUser object were not considered, such as manually created superusers, further contributing to data inconsistency.
If a user does not have a related RegisteredUser object, then use "unspecified" method for that user. User the special value as organization_id for such users.
- Can we allow the device charts to ignore the organization if configured to do so?
I was looking into the queries made to the InfluxDB when opening the device charts. I found the following relevant queries related to RADIUS sessions
SELECT SUM(output_octets) / 1000000000 AS upload, SUM(input_octets) / 1000000000 AS download FROM radius_acc WHERE time >= '2023-04-19' AND content_type = 'config.device' AND object_id = '4d1a6d5d-a638-46fb-8297-6ea050388985' GROUP BY time(7d)
SELECT COUNT(DISTINCT(username)) FROM radius_acc WHERE time >= '2023-04-19' AND content_type = 'config.device' AND object_id = '4d1a6d5d-a638-46fb-8297-6ea050388985' GROUP by time(7d), method
These do not filter by organization
When writing RADIUS accounting metric, we may not always want to lookup the device with the organization of the RadiusAccounting object. We shall add a setting, which if enabled removes the organization_id from the Device lookup in post_save_radiusaccounting task.