flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

job-list: support queue specific stats

Open chu11 opened this issue 3 years ago • 2 comments

per #4604, get queue specific job stats. in flux-jobs, the --queue option can now also select which queue --stats outputs.

chu11 avatar Oct 13 '22 18:10 chu11

re-pushed, fixed up && chain in tests, valgrind memleak find, and python linting

chu11 avatar Oct 13 '22 23:10 chu11

so something's a little off there.

hmmm. maybe a bug related to restarting of a flux instance, which I may have not added a test for. Thanks!

Edit: no not an errant bug ... I completely missed it!! Need to do it and cover with a test

chu11 avatar Oct 24 '22 16:10 chu11

re-pushed, fixing up the issues @garlick found above

fixed queue specific stats on reload of the job-list module and added some tests & extra coverage for it

However, I was not able to figure out how

0 running, -1 completed, 1 failed, 0 pending

occurred though. It's perplexing. Basically in the python code the number of completed jobs is:

self.successful = self.inactive - self.failed

suggesting inactive job count was < the failed job count to get the -1 output. Which I can't see a way in which that could happen.

My only guess is it could have been a counting issue with a job that was active when the job-list module was reloaded and one of the counters may have been under-counted. But I don't think that's what happened here.

chu11 avatar Oct 25 '22 21:10 chu11

My only guess is it could have been a counting issue with a job that was active when the job-list module was reloaded and one of the counters may have been under-counted. But I don't think that's what happened here.

This might be unrelated to this PR. Watch this:

 garlick@picl0:~$ flux jobs --stats-only
0 running, 87 completed, 13 failed, 0 pending
 garlick@picl0:~$ flux mini run -q debug /bin/false
flux-job: task(s) exited with exit code 1
 garlick@picl0:~$ flux jobs --stats-only
0 running, 86 completed, 14 failed, 0 pending

Somehow the completed count dropped. This is on master right after #4687 (emoji) was merged.

Edit: also reproduces on current master:

 garlick@picl0:~$ flux version
commands:    		0.44.0-113-ge549e5afd
libflux-core:		0.44.0-113-ge549e5afd
libflux-security:	0.8.0-2-g8da4e73
build-options:		+systemd+hwloc==2.4.0+zmq==4.3.4
 garlick@picl0:~$ flux jobs --stats-only
0 running, 86 completed, 14 failed, 0 pending
 garlick@picl0:~$ flux mini run -q debug /bin/false
flux-job: task(s) exited with exit code 1
 garlick@picl0:~$ flux jobs --stats-only
0 running, 85 completed, 15 failed, 0 pending

garlick avatar Oct 26 '22 13:10 garlick

Codecov Report

Merging #4684 (ad1193a) into master (cb4d814) will decrease coverage by 0.02%. The diff coverage is n/a.

:exclamation: Current head ad1193a differs from pull request most recent head b0a6919. Consider uploading reports for the commit b0a6919 to get more accurate results

@@            Coverage Diff             @@
##           master    #4684      +/-   ##
==========================================
- Coverage   83.37%   83.35%   -0.03%     
==========================================
  Files         413      413              
  Lines       69781    69664     -117     
==========================================
- Hits        58182    58070     -112     
+ Misses      11599    11594       -5     
Impacted Files Coverage Δ
src/common/libutil/ipaddr.c 64.70% <0.00%> (-4.82%) :arrow_down:
src/bindings/python/flux/uri/resolvers/lsf.py 87.50% <0.00%> (-2.09%) :arrow_down:
src/cmd/flux-jobs.py 95.52% <0.00%> (-1.46%) :arrow_down:
src/modules/job-info/guest_watch.c 76.75% <0.00%> (-0.55%) :arrow_down:
src/common/libsubprocess/server.c 60.54% <0.00%> (-0.55%) :arrow_down:
src/broker/overlay.c 85.58% <0.00%> (-0.41%) :arrow_down:
src/bindings/python/flux/util.py 94.40% <0.00%> (-0.39%) :arrow_down:
src/modules/job-manager/submit.c 81.37% <0.00%> (-0.19%) :arrow_down:
src/bindings/python/flux/resource/Rlist.py 94.68% <0.00%> (-0.17%) :arrow_down:
src/bindings/python/flux/job/Jobspec.py 83.98% <0.00%> (-0.07%) :arrow_down:
... and 10 more

codecov[bot] avatar Oct 28 '22 02:10 codecov[bot]