cromshell icon indicating copy to clipboard operation
cromshell copied to clipboard

CRITICAL: job status reported by "cromshell list -u" is incorrect and never updates.

Open dalessioluca opened this issue 3 years ago • 6 comments

cromshell list -u is supposed to check completion status of all unfinished jobs. However sometimes it reports incorrect values while cromshell status reports the correct ones. Even after running cromshell status with a specific job id, cromshell list -u keep listing the old incorrect status.

The implication is that the status reported by cromshell list -u is unreliable. This could lead to job keep running silently while the user believe that those job were terminated and therefore this is a critical bug.

I have not figure out how to replicate the problem. However here there are 8 examples of jobs that are listed as running but are in fact terminated.

Screen Shot 2021-11-29 at 9 32 18 AM

dalessioluca avatar Nov 03 '21 16:11 dalessioluca

Huh.... I wonder why thats happening.

lbergelson avatar Nov 03 '21 16:11 lbergelson

Probably has to do with how the TSV gets updated when you query / update it.

Somewhere in teh status function the ~/.cromshell/<TSV> file is updated. That's almost certainly where the problem lies.

jonn-smith avatar Nov 03 '21 16:11 jonn-smith

Priority of for list -u in cromshell 2.0 bumped. @bshifaw

SHuang-Broad avatar Nov 03 '21 17:11 SHuang-Broad

I have just noted that the jobs with the wrong status are present in 3 tsv files. Could that be part of the problem?

Screen Shot 2021-11-03 at 3 12 49 PM

dalessioluca avatar Nov 03 '21 19:11 dalessioluca

You can place this script in your .cromshell directory to check the status of your jobs. It simply runs cromshell status in a loop.

  1 #!/bin/bash
  2 cat all.workflow.database.tsv | awk '{print $(NF-2)}' | sort | uniq > id_to_check.txt  #check only most current ids
  3 # cat all.workflow.database.tsv* | awk '{print $(NF-2)}' | sort | uniq > id_to_check.txt # check all ids
  4 lines=$( cat id_to_check.txt )
  5 
  6 
  7 rm -rf status.txt
  8 for job_id in $lines
  9 do
 10 >-------if [ $job_id != 'WDL_NAME' ]; then
 11 >------->-------status=$(cromshell status $job_id | grep "status" )
 12 >------->-------echo $job_id $status >> status.txt
 13 >-------fi
 14 done
 15 
 16 echo "The following jobs are running:"
 17 cat status.txt | grep "unning"

dalessioluca avatar Nov 03 '21 19:11 dalessioluca

The multiple files shouldn't be an issue - it should only be looking in all.workflow.database.tsv.

I'll take a look at this very soon.

jonn-smith avatar Nov 16 '21 20:11 jonn-smith