flower
flower copied to clipboard
Prometheus: fix offline worker metrics
@mher A fully working (tested with Prometheus/Grafana) code now pushed. All metrics for offline workers are removed and grafana eventually runs out of data for its graphs and removes all the plots for non existent workers.
Tests added. My own comments added for guidance for the reviewer.
Please let me know if this works. Would be nice if someone actually run this and observe graphs in grafana....
Log when running this after celery worker is stopped:
[I 210815 23:22:43 web:2239] 200 GET /metrics (127.0.0.1) 3.92ms
[I 210815 23:22:58 web:2239] 200 GET /metrics (127.0.0.1) 3.41ms
[I 210815 23:23:12 web:2239] 200 GET /dashboard?json=1&_=1629066192482 (::1) 0.53ms
[I 210815 23:23:13 web:2239] 200 GET /metrics (127.0.0.1) 3.85ms
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'task-started', 'tasks.add') for metric counter:flower_events
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'task-received', 'tasks.add') for metric counter:flower_events
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'task-succeeded', 'tasks.add') for metric counter:flower_events
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'task-failed', 'tasks.add') for metric counter:flower_events
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'tasks.add') for metric histogram:flower_task_runtime_seconds
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'tasks.add') for metric gauge:flower_task_prefetch_time_seconds
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560', 'tasks.add') for metric gauge:flower_worker_prefetched_tasks
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560',) for metric gauge:flower_worker_online
[D 210815 23:23:28 prometheus_metrics:57] Removed label set: ('worker1-concurrency-1@XPS-15-9560',) for metric gauge:flower_worker_number_of_currently_executing_tasks
[I 210815 23:23:28 web:2239] 200 GET /metrics (127.0.0.1) 6.30ms
@mher I pushed more code - please have a quick look and let me know if this solution is acceptable, then I will add all the missing tests. All was tested manually multiple times with actual Prometheus/Grafana/Celery/Flower running.
@mher Can please provide details when this will be added?
@Tomasz-Kluczkowski can you update your branch? then at least it will be possible for people to run (and test) your fork until this finally gets merged.
this not being fixed yet is a real pity. would love to see this move to the main branch and get properly released.