Augustus icon indicating copy to clipboard operation
Augustus copied to clipboard

Difference in usage of join_aug_pred.pl in parallelized run

Open TaehyungKwon opened this issue 3 years ago • 0 comments

Dear all,

I have found a difference in using split results as an input of join_aug_pred.pl. This was mentioned in #73, but I think this thread couldn't give me a clear explanation. There are basically three options to use this perl script to combine the parallelized results.

  1. The article Hoff and Stanke 2019 (doi: 10.1002/cpbi.57) suggested this which does not work for (i=1; i<=30; i++); do cat $augDir/augustus.$i.out | join_aug_pred.pl > augustus.gff done

  2. Katharina Hoff suggested at #73, for (i=1; i<=30; i++); do cat $augDir/augustus.$i.out | join_aug_pred.pl >> augustus.gff done

  3. Other user suggested at #73 that I could do for i in {1..30}; do cat augustus.$i.out ; done > concatenated_augustus.gff join_aug_pred.pl < concatenated_augustus.gff > joined_augustus.gff; done

I tried 2) and 3), and both worked. However, I found ~5% of file size difference between outputs from 2) and 3). Am I supposed to use 2)? if yes, could you kindly explain how join_aug_pred.pl treats split results?

Thanks,

TaehyungKwon avatar Aug 28 '20 17:08 TaehyungKwon