clusterflow
clusterflow copied to clipboard
Negative complete jobs in qstat output
======================================================================
Cluster Flow Pipeline: samtools_sort_index
Submitted: 20 minutes, 5 seconds ago
Working Directory: /bi/group/bioinf/Rachael_Huntly/Cufflinks_Analysis/Rachel_0_vs_8_hour
Cluster Flow ID: samtools_sort_index_1485260139
Submitted Jobs: 17
Running Jobs: 8
Queued Jobs: 11 (resources)
Completed Jobs: -2 (-11%)
======================================================================
- samtools_sort_index [4 cores]
- email_run_complete
- email_run_complete
- samtools_sort_index [4 cores]
- email_run_complete
- samtools_sort_index [4 cores]
- email_run_complete
- samtools_sort_index [4 cores]
- email_run_complete
- samtools_sort_index [4 cores]
- email_run_complete
- samtools_sort_index [4 cores]
- email_run_complete
- samtools_sort_index [4 cores]
- email_run_complete
- email_pipeline_complete
- samtools_sort_index [4 cores]
- email_run_complete
- email_run_complete
Is this always the case? Or only occasionally?
The code that does this parses how many jobs were submitted from the initial log file, then subtracts the number of running / pending jobs etc. I guess I could easily add a check that this number is ≥ 0 (and make it 0 if not), but it would be better to figure out why it's able to get a negative number..
Phil
@s-andrews / @FelixKrueger - if one of you could send me the CF submission log for a run where this is happened I'll take a look. I think it must be a case that the number of jobs submitted aren't being counted properly.
here is one, cheers. cf_bismark_singlecell_1488545447_submissionlog.txt
submission log:
Cluster Flow Pipeline: bismark_singlecell
Submitted: 7 minutes, 2 seconds ago
Working Directory: /path/to/dir
Cluster Flow ID: bismark_singlecell_1488545447
Submitted Jobs: 902
Running Jobs: 75
Queued Jobs: 1102 (resources)
Completed Jobs: -275 (-30%)
Hmm, strange. I agree that it looks like there were 902 jobs submitted there. So it must be over-counting the queued jobs somehow.
Ok, next up - could you do a cf --qstat to get the above log followed by a normal qstat so that I can try to figure out why it thinks that there are so many pipeline jobs queued please..
Also - I didn't actually explicitly say this myself, but it works fine for me 😁 That's why I'm asking you guys to do stuff.
Two more questions:
- Does it always do this, or only sometimes?
- Why are you running
v0.4_dev?v0.4is the latest released version andv0.5_devis the most recent development version 😉
Phil
Are you sure you want this? ^^ CF_qstat.txt qstat.txt
Ah, no good - everything is fine in CF_qstat.txt, looks like the correct number of running and queued jobs, no negative Completed Jobs number..
..spoke to soon, there are a lot of different pipeline runs in this file it would seem...!!!
I see this:
Cluster Flow ID: bismark_singlecell_1488545447
Submitted Jobs: 902
Running Jobs: 77
Queued Jobs: 1095 (resources)
Completed Jobs: -270 (-29%)
======================================================================
- bismark_align [4 cores] [queued, priority 0]
- bismark_deduplicate
- bismark_methXtract
- bismark_report
yes sorry, it's not like I have nothing to do... :)
Ah, I need longer qstat output though. The default trims the full job name, I forgot that. Can you instead do qstat -pri -r -xml please?
Here you go: qstat.txt
Yay, 75114 lines of xml for me to read through. Such a lucky boy! 🥇