stringtie icon indicating copy to clipboard operation
stringtie copied to clipboard

Stringtie tmp folder is not deleted

Open asumann opened this issue 4 years ago • 4 comments

Hi,

It is my first time with Stringtie v.1.3.4. Program has finished with no error. But in the gtf output folder a tmp folder still remains. Is that a problem of unsuccessful submission?

Here is the command I used:

stringtie $INPUT_BAM -l $SAMPLE -p 8 -G $REF_GFF -o $OUTPUT_GTF

Best of all, Asuman

asumann avatar Jul 23 '20 11:07 asumann

The tmp folder should be removed when the program exits normally. However I suppose there are situations (particularly grid environments or some network file systems) where the directory removal may fail unexpectedly even though the program otherwise finishes successfully (i.e. the output file was fully written). I have a few questions in order to further investigate this issue:

  • is the leftover tmp folder empty? (any files in there?)
  • are you running StringTie in a grid environment ? (if so, which kind?)
  • is the GTF output file valid ? Take a look at the last line in the file, does it appear truncated? (does it have the same number of columns as previous lines?)

If there is a file present in the tmp folder, the last line (transcript) in that file should correspond to the last transcript in the output GTF if the output GTF was written successfully. In that case I suppose the tmp folder should be ignored (or removed manually after the stringtie execution).

(Obviously if you're running multiple StringTie processes writing to the same output directory then it is possible that those tmp folders may be from other processes that are still currently running or from past failed processes.)

gpertea avatar Jul 24 '20 15:07 gpertea

Thanks for your super quick reply!

I run the job on university's HPC using Slurm.

There is only one file -.gtf.tmp- remaining in the tmp folder. It does seem truncated. Column numbers are not consistent. The last transcript is not the same for tmp and gtf file of the same sample and number of lines are very few in tmp file. That seems like a problem.

Then, I compared line counts between gtf of that particular sample(not tmp) and other samples. That seems consistent. So, I assume even though there is a tmp file only for one sample, maybe it was written out successfully.

This is what I run exactly:

for SAMPLE in $(cat $SAMPLES);
do
    INPUT_BAM=map/${SAMPLE}.bam
    OUTPUT_GTF=assembly/${SAMPLE}.gtf
    stringtie $INPUT_BAM -l $SAMPLE -p 8 -G $REF_GFF -o $OUTPUT_GTF
done

I've come to think the same as your last note. Yet, given the details above, I would like to hear a further confirmation from you.

asumann avatar Jul 25 '20 20:07 asumann

I run the same code for another dataset. This time there were two tmp folders with the same file name. However, the file was empty.

asumann avatar Sep 04 '20 10:09 asumann

"(Obviously if you're running multiple StringTie processes writing to the same output directory then it is possible that those tmp folders may be from other processes that are still currently running or from past failed processes.)"

Just create a new folder for each sample:

for SAMPLE in $(cat $SAMPLES)
do
  INPUT_BAM=map/${SAMPLE}.bam
  OUT_PATH=assembly/$SAMPLE
  mkdir -pv $OUT_PATH
  OUTPUT_GTF=$OUT_PATH/${SAMPLE}.gtf
  stringtie $INPUT_BAM -l $SAMPLE -p 8 -G $REF_GFF -o $OUTPUT_GTF
done

rpg18 avatar Jun 07 '21 13:06 rpg18