flower icon indicating copy to clipboard operation
flower copied to clipboard

Prometheus: fix offline worker metrics

Open Tomasz-Kluczkowski opened this issue 2 years ago • 3 comments

@mher A fully working (tested with Prometheus/Grafana) code now pushed. All metrics for offline workers are removed and grafana eventually runs out of data for its graphs and removes all the plots for non existent workers.

Tests added. My own comments added for guidance for the reviewer.

Please let me know if this works. Would be nice if someone actually run this and observe graphs in grafana....

Log when running this after celery worker is stopped:

[I 210815 23:22:43 web:2239] 200 GET /metrics (127.0.0.1) 3.92ms
[I 210815 23:22:58 web:2239] 200 GET /metrics (127.0.0.1) 3.41ms
[I 210815 23:23:12 web:2239] 200 GET /dashboard?json=1&_=1629066192482 (::1) 0.53ms
[I 210815 23:23:13 web:2239] 200 GET /metrics (127.0.0.1) 3.85ms
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'task-started', 'tasks.add') for metric counter:flower_events
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'task-received', 'tasks.add') for metric counter:flower_events
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'task-succeeded', 'tasks.add') for metric counter:flower_events
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'task-failed', 'tasks.add') for metric counter:flower_events
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'tasks.add') for metric histogram:flower_task_runtime_seconds
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'tasks.add') for metric gauge:flower_task_prefetch_time_seconds
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'tasks.add') for metric gauge:flower_worker_prefetched_tasks
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560',) for metric gauge:flower_worker_online
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560',) for metric gauge:flower_worker_number_of_currently_executing_tasks
[I 210815 23:23:28 web:2239] 200 GET /metrics (127.0.0.1) 6.30ms

Tomasz-Kluczkowski avatar Aug 09 '21 22:08 Tomasz-Kluczkowski

@mher I pushed more code - please have a quick look and let me know if this solution is acceptable, then I will add all the missing tests. All was tested manually multiple times with actual Prometheus/Grafana/Celery/Flower running.

Tomasz-Kluczkowski avatar Aug 14 '21 14:08 Tomasz-Kluczkowski

@mher Can please provide details when this will be added?

danyi1212 avatar Sep 04 '22 12:09 danyi1212

@Tomasz-Kluczkowski can you update your branch? then at least it will be possible for people to run (and test) your fork until this finally gets merged.

this not being fixed yet is a real pity. would love to see this move to the main branch and get properly released.

drummerwolli avatar Sep 12 '22 09:09 drummerwolli