openQA icon indicating copy to clipboard operation
openQA copied to clipboard

Use the workers count from Minion 10.25 to make monitoring more reliable

Open kraih opened this issue 3 years ago • 2 comments

It has been suggested that our monitoring is unrelaible because adding up number of active and inactive workers doesn't quite work right. So i've added the number of all registered workers upstream to the Minion stats. The feature will be available with the Minion 10.25 release.

Progress: https://progress.opensuse.org/issues/112898

kraih avatar Jun 24 '22 12:06 kraih

Since Minion 10.25 is not yet in Factory this PR is not ready.

kraih avatar Jun 24 '22 12:06 kraih

https://build.opensuse.org/package/show/devel:openQA:Leap:15.4/perl-Minion is 10.25 now so all good.

@Mergifyio rebase

okurz avatar Jul 06 '22 12:07 okurz

@Mergifyio rebase

okurz avatar Sep 24 '22 15:09 okurz

rebase

✅ Branch has been successfully rebased

mergify[bot] avatar Sep 24 '22 15:09 mergify[bot]

[16:01:57] t/api/13-influxdb.t ................ 1/? 
#   Failed test 'exact match for content'
#   at t/api/13-influxdb.t line 48.
#          got: 'openqa_minion_jobs,url=http://example.com active=0i,delayed=0i,failed=1i,inactive=1i
# openqa_minion_workers,url=http://example.com active=0i,inactive=1i,registered=1i
# '
#     expected: 'openqa_minion_jobs,url=http://example.com active=0i,delayed=0i,failed=1i,inactive=1i
# openqa_minion_workers,url=http://example.com ,active=0i,inactive=1i,registered=1i
# '
# Looks like you failed 1 test of 17

couldn't reproduce that problem locally.

okurz avatar Sep 27 '22 13:09 okurz

Looks strange. I expected that the order might differ but there's a , missing.

Martchus avatar Sep 27 '22 14:09 Martchus

https://build.opensuse.org/package/live_build_log/devel:openQA:GitHub:os-autoinst:openQA:PR-4723/openQA/openSUSE_Leap_15.4/x86_64 reproduces the problem. I assume some local packages differ for me which actually makes it work

okurz avatar Sep 27 '22 15:09 okurz

I can reproduce the failure locally

perlpunk avatar Sep 28 '22 09:09 perlpunk

test passing now:

[09:58:54] t/api/13-influxdb.t ................ ok

perlpunk avatar Sep 28 '22 10:09 perlpunk

test passing now:

[09:58:54] t/api/13-influxdb.t ................ ok

\o/ so maybe I fixed it by now and got confused by two months of absence in the middle or something ;)

okurz avatar Sep 28 '22 10:09 okurz

Likely. Annoyingly now there's another test failure:

[10:07:10] t/deploy.t ................................................ 8/? Use of uninitialized value $is_admin in numeric eq (==) at template admin/workers/index.html.ep line 74.
Use of uninitialized value $is_admin in numeric eq (==) at template admin/workers/index.html.ep line 74.
[10:07:10] t/deploy.t ................................................ 15/? 
#   Failed test 'no (unexpected) warnings (via done_testing)'
#   at t/deploy.t line 116.
# Got the following unexpected warnings:
#   1: Use of uninitialized value $is_admin in numeric eq (==) at template admin/workers/index.html.ep line 74.
#   2: Use of uninitialized value $is_admin in numeric eq (==) at template admin/workers/index.html.ep line 74.
# Looks like you failed 1 test of 16.
                                                                             [10:07:10] t/deploy.t ................................................ Dubious, test returned 1 (wstat 256, 0x100)

Martchus avatar Sep 28 '22 10:09 Martchus

I see the OBS checks fail the same locally

okurz avatar Sep 29 '22 14:09 okurz

Codecov Report

Merging #4723 (85aeb35) into master (af47120) will decrease coverage by 0.00%. The diff coverage is 96.42%.

:exclamation: Current head 85aeb35 differs from pull request most recent head d819dd6. Consider uploading reports for the commit d819dd6 to get more accurate results

@@            Coverage Diff             @@
##           master    #4723      +/-   ##
==========================================
- Coverage   98.10%   98.09%   -0.01%     
==========================================
  Files         376      376              
  Lines       34847    34832      -15     
==========================================
- Hits        34185    34170      -15     
  Misses        662      662              
Impacted Files Coverage Δ
lib/OpenQA/CacheService/Controller/Influxdb.pm 5.55% <0.00%> (ø)
t/api/13-influxdb.t 100.00% <ø> (ø)
lib/OpenQA/WebAPI/Controller/Admin/ActivityView.pm 100.00% <100.00%> (ø)
lib/OpenQA/WebAPI/Controller/Admin/Influxdb.pm 98.38% <100.00%> (ø)
lib/OpenQA/WebAPI/Controller/Admin/Workers.pm 100.00% <100.00%> (ø)
t/24-worker-engine.t 100.00% <100.00%> (ø)
t/25-cache-service.t 100.00% <100.00%> (ø)
t/api/08-jobtemplates.t 98.34% <0.00%> (-0.06%) :arrow_down:
lib/OpenQA/WebAPI/Controller/API/V1/JobTemplate.pm 93.88% <0.00%> (-0.04%) :arrow_down:
t/config.t 100.00% <0.00%> (ø)
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Sep 29 '22 15:09 codecov[bot]

IMHO https://github.com/os-autoinst/openQA/blob/master/t/25-cache-service.t#L488 should cover lib/OpenQA/CacheService/Controller/Influxdb.pm but codecov does not see any coverage recorded in https://app.codecov.io/gh/os-autoinst/openQA/blob/master/lib/OpenQA/CacheService/Controller/Influxdb.pm . Looks like the coverage isn't properly recorded. Any hints?

okurz avatar Sep 29 '22 15:09 okurz